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1 

Title: Method for preparing polypeptide variants 
FIELD OF THE INVENTION 

5 

The present invention relates to a method for preparing polypeptide 
variants by in vivo recombination. 

10 BACKGROUND OF THE INVENTION 

The advantages of producing biologically active polypeptides by 
cloning naturally occurring DNA sequences from microorganisms, such 
as fungal organisms and bacteria using recombinant DNA technology 
15 have been known for quite some years. 

Preparation of novel polypeptide variants and mutants, such as novel 
modified enzymes with altered characteristics, e.g. specific activ- 
ity, substrate specificity, pH-optimum, pi, j^, etc., have 
20 especially during the recent years diligently and successfully been 
used for obtaining polypeptides with improved properties. 

For instance, within the technical field of enzymes the washing 
and/or dishwashing performance of e.g. proteases, lipases, amylases 
25 and cellulases have been improved significantly. 

In most cases these improvements have been obtained by site-directed 
mutagenesis resulting in substitution, deletion or insertion of 
specific amino acid residues which have been chosen either on the 
30 basis of their type or on the basis of their location in the second- 
ary or tertiary structure of the mature enzyme (see for instance US 
patent no. 4,518,584). 

An alternative general approach for modifying proteins and enzymes 
35 have been based on random mutagenesis, for instance, as disclosed in 
US 4,894,331 and WO 93/01285 

As it is a cumbersome and time consuming process to obtain po- 
lypeptide variants or mutants with improved functional properties a 
4 0 few alternative methods for rapid preparation of modified 
polypeptides have been suggested. 
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Weber et al . , (1983), Nucleic Acids Research, vol ll, 5661-5661 
describes a method for modifying genes by in vivo recombination 
between to homologous genes. A linear DNA sequence comprising a 
plasmid vector flanked to a DNA sequence encoding alpha-1 human 
interferon in the 5- -end and a DNA sequence encoding alpha-2 human 
interferon in the 3 '-end is constructed and transfected into a rec A 
positive strain of E. coll. Recombinants were identified and 
isolated using a resistance marker. 

Pompon el al., (1989), Gene 83 , p . 15 _ 24/ describes g method ^ 
shuffling gene domains of mammalian cytochrome P-450 by in vivo 
recombination of partially homologous sequences in Saccnaromyces 
cerevisiae by transforming Saccharomyces cerevisia with a linearized 
Plasmid with filled-in ends, and a DNA fragment being partially 
homologous to the ends of said plasmid. 

Stemmer, (1994), Proc. Natl. Acad. Sci. USA, Vol. 91, 10747-10751- 
Stemmer, (1994), Nature, vol. 370, 389- 391, concern methods for 
shuffling homologous DNA sequences by an in vitro PGR method. One 
cycle of shuffling consists of digesting a pool of homologous genes 
with DNase I. The resulting small fragments are reassembled into 
full-length genes. Positive recombinant genes containing shuffled 
DNA sequences are selected from a DNA library based on their 
improved function. Positive recombinants can be used as the starting 
material for (an) other shuffling round (s) . 

US patent no. 5,093,257 (Assignee: Genencor Int. Inc.) discloses a 
method for producing hybrid polypeptides by in vivo recombination. 
Hybrid DNA sequences are produced by forming a circular vector 
comprising a replication sequence, a first DNA sequence encoding the 
amino- terminal portion of the hybrid polypeptide, a second DNA 
sequence encoding the carboxy-terminal portion of said hybrid 
polypeptide. The circular vector is transformed into a rec positive 
microorganism in which the circular vector is amplified. This 
results m recombination of said circular vector mediated by the 
naturally occurring recombination mechanism of the rec positive 
microorganism, which include prokaryotes such as Bacillus and E 
coli, and .eukaryotes such as Saccharomyces cerevisiae. 
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Despite the existence of the above methods there are still need for 
even better iterative in vivo recombination methods for preparing 
novel positive polypeptide variants. 

SUMMARY OF THE INVENTION 

The object of the present invention is to provide an improved method 
for preparing positive polypeptide variants by an in vivo 
recombination method. 

The inventor of the present invention have surprisingly found that 
such positive polypeptide variants may advantageously be prepared by 
shuffling different nucleotide sequences of homologous DNA sequences 
by in vivo recombination comprising the steps of 

a) forming at least one circular plasmid comprising a DNA sequence 
encoding a polypeptide, 

b) opening said circular plasmid(s) within the DNA sequence (s) 
encoding the polypeptide (s) , 

c) preparing at least one DNA fragment comprising a DNA sequence 
homologous to at least a part of the polypeptide coding region on at 
least one of the circular plasmid (s), d) introducing at least one 
of said opened plasmid (s) , together with at least one of said 
homologous DNA fragment (s) covering full-length DNA sequences, 
encoding said polypeptide (s) or parts thereof, into a recombination 
host cell, 

e) cultivating said recombination host cell, and 

f) screening for positive polypeptide variants. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the yeast expression plasmid pJS026 comprising DNA 

sequence encoding the Humicola lanuginosa lipase gene. 

Figure 2 shows the yeast expression plasmid pJS037, comprising DNA 

sequence encoding the Humicola lanuginosa lipase gene containing 

twelve additional restriction sites. 

Figure 3 shows the plasmid pJS026. 

Figure 4 shows the plasmid pJS037. 

Figure 5 shows the in vivo recombination of the 0.9 kb synthetic 
wild-type Humicola lanuginosa lipase with pJS037 using Saccharomyces 
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cerevisiae as the recombination host cell (described in Example 1). 
Figure 6 shows the in vivo recombination of a DNA fragment prepared 
from Humicola lanuginosa lipase variant (y) with Humicola lanuginosa 
lipase variant (d) comprised in a plasmid using Saccharomyces 
cerevisiae as the recombination host cell (described in Example 2). 
Figure 7 shows an overview over the location of the inactivation 
site of the Humicola lanuginosa lipase gene and the number of the 
clone (referred to as "blue number" in the tables) . Location of 
restriction enzyme sites and clone numbers are relative to the 
initiation codon of the lipase gene. In all cases a stop codon was 
located in the new reading frame 10 to 50 bp from the frameshift. 
Figure 8 shows an overview of the creation of active humicola 
lanuginosa lipase genes from the recombinations in table 2A and B 
by a "mosaic mechanism". Lines indicate the introduction of the 
fragment sequence into the vector and lines with a x indicate 
sequences that are not introduced in the active lipase colonies. 
The primers used for the PCR fragment are shown together with the ' 
location of the frameshift mutation (marked by the restriction site 
used for the construction) . 

Figure 9 shows an overview of fragments used in the recombination 
of 2 partial overlapping fragments into a gapped vector. The 
primers used for the PCR fragments are shown together with the 
location of the frameshift mutation (if not wild type). 
Figure 10 shows an overview of fragments used in the recombination 
of 3 partial overlapping fragments into a gapped vector. The 
primers used for the PCR fragments are shown. The overlap between 
PCR353 and 355 is only a 10 bp. 

DETAILED DESCRIPTION OF THE INVENTION 

The object of the present invention is to provide an improved method 
for preparing positive polypeptide variants by an iterative in vivo 
recombination method. 

The inventor of the present invention have surprisingly found an 
efficient method for shuffling homologous DNA sequences in an in 
vivo recombination system using a eukaryotic cell as a recombination 
host cell.. 
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A "recombination host cell" is in the context of the present 
invention a cell capable of mediating shuffling of a number of 
homologous DNA sequences. 

5: 

The term "shuffling" means recombination of nucleotide sequence (s) 
between two or more homologous DNA sequences resulting in output DNA 
sequences (i.e. DNA sequences having been subjected to a shuffling 
cycle) having a number of nucleotides exchanged, in comparison to 
10 the input DNA sequences (i.e. starting point homologous DNA 
sequences) . 

An important advantage of the invention is that mosaic DNA sequences 
with multiple replacement points or replacements, nor related to the 
15 opening site, is created, which is not discovered in Pompon's 
method. 

An other important advantage of the present invention is that when 
using a mixture of fragments and opened vectors (in the screening 
20 set up) it gives the possibility of many different clones to 
recombine pairwise or even triplewise (as can be seen in a couple of 
examples below) . 

The in vivo recombination method of the invention simple to perform 
25 and results in a high level of mixing of homologous genes or 
variants. A large number of variants or homologous genes can be 
mixed in one transformation. The mixing of improved variants or wild 
type genes followed by screening increases the number of further 
improved variants manyfold compared to doing only random 

3 0 mut agene sis. 

Recombination of multiple overlapping fragments is possible with a 
high efficiency increasing the mixing of variants or homologous 
genes using the in vivo recombination method. An overlap as small as 
35 10 bp is sufficient for recombination which may be utilized for very 
easy domain shuffling of even distantly related genes. 

The invention relates to a method for preparing polypeptide variants 
by shuffling different nucleotide sequences of homologous DNA 

4 0 sequences by in vivo recombination comprising the steps of 



WO 97/07205 

PCT/DK96/00343 



a) forming at least one circular plasmid comprising a DNA sequence 
encoding a polypeptide, 

b) opening said circular plasmid(s) within the DNA. sequence (s) 
encoding the polypeptide (s) , 

5 c) preparing at least one DNA fragment comprising a DNA sequence 
homologous to at least a part of the polypeptide coding region on at 
least one of the circular plasmid(s), d) introducing at least one 
of said opened plasmid(s), together with at least one of said 
homologous DNA fragment (s) covering full-length DNA sequences 
10 encoding said polypeptide (s) or parts thereof, into a recombination 
host cell, 

e) cultivating said recombination host cell, and 

f) screening for positive polypeptide variants. 
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According to the invention more than one cycle of step a) to f ) mav 
be performed. y 

The opening of the plasmid (s) in step b) can be directed toward any 
site within the polypeptide coding region of the plasmid. The 
plamid(s) may be opened by any suitable methods known in the art 
The opened ends of the plasmid may be filled-in with nucleotides as 
described in Pompon et al. (1989), supra), it is preferred not to 
fill in the opened ends as it might create a frameshift. ' 

It is preferred to open the plasmid(s) around the middle of the 
polypeptide coding DNA sequence (s), as this is believed to result in 
a more effective recombination between DNA fragment (s) and opened 
plasmid (s) . 

in an embodiment of the invention the DNA fragment (s) is (are) 
prepared under conditions resulting in a low, medium or high random 
mutagenesis frequency. 

To obtain low mutagenesis frequency the DNA sequence (s) (comprising 
the DNA fragment (s) ) may be prepared by a standard PGR amplification 
method (US 4,683,202 or Saiki et al., (1988), Science 239, 487 - 

A medium or high mutagenesis frequency may be obtained by performing 
the PGR amplification under conditions which increase the mis- 
mcorporation of nucleotides, for instance as described by Deshler, 
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(1992), GATA 9(4), 103-106; Leung et ah, (1989), Technique, Vol. 1, 
No. 1, 11-15. 

It is also contemplated according to the invention to combine the 
PCR amplification (i.e. according to this embodiment also DNA 
fragment mutation) with a mutagenesis step using a suitable physical 
or chemical mutagenizing agent, e.g., one which induces transitions, 
transversions, inversions, scrambling, deletions, and/or insertions. 

In the context of the present invention the term "positive poly- 
peptide variants" means resulting polypeptide variants possessing 
functional properties which has been improved in comparison to the 
polypeptides producible from the corresponding input DNA sequences. 
Examples, of such improved properties can be as different as e.g. 
biological activity, enzyme washing performance, antibiotic resis- 
tance etc. 

Consequently, which screening method to be used for identifying 
positive variants depend on the desired improved property of the 
polypeptide variant in question. 

If, for instance, the polypeptide in question is an enzyme and the 
desired improved functional property is the wash perf ozhnance, the 
screening in step f) may conveniently be performed by use of. a 
filter assay based on the following principle: 

The recombination host cell is incubated on a suitable medium and 
under suitable conditions for the enzyme to be secreted, the medium 
being provided with a double filter comprising a first protein- 
binding filter and on top of that a second filter exhibiting a low 
protein binding capability. The recombination host cell is located 
on the second filter. Subsequent to the incubation, the first filter 
comprising the enzyme secreted from the recombination host cell is 
separated from the second filter comprising said cells. The first 
filter is subjected to screening for the desired enzymatic activity 
and the corresponding microbial colonies present on the second 
filter are identified. 

The filteir used for binding the enzymatic activity may be any 
protein binding filter e.g. nylon or nitrocellulose. The topfilter 
carrying the colonies of the expression organism may be any filter 
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that has no or low affinity for binding proteins e.g. cellulose 
acetate or Durapore6. The filter may be pre-treated with any of the 
conditions to be used for screening or may be treated during the 
detection of enzymatic activity. 

The enzymatic activity may be detected by a dye, fluorescence 
precipitation, p H indicator, IR-absorbance or any other known 
technique for detection of enzymatic activity. 

The detecting compound may be immobilized by any immobilizing agent 
e.g. agarose, agar, gelatine, polyacrylamide, starch, filter paper, 
cloth; or any combination of immobilizing agents. 

If the improved functional property of the polypeptide is not 
sufficiently good after one cycle of shuffling, the polypeptide may 
be subjected to another cycle. 

In an embodiment of the invention at least one shuffling cycle is a 
backcrossing cycle with the initially used DNA fragment, which, may 
be the wild-type DNA fragment. This eliminates non-essential muta- 
tions. Non-essential mutations may also be eliminated by using wild- 
type DNA fragments as the initially used input DNA material . 

It is to be understood that the method of the invention is suitable 
for all types of polypeptide, including enzymes such as proteases 
amylases, lipases, cutinases, amylases, cellulases, peroxidases and 
oxidases . 

Also contemplated according to the invention is polypeptides having 
biological activity such as insulin, ACTH, glucagon, somatostatin, 
somatotropin, thymosin, parathyroid hormone, pigmentary hormones, 
somatomedin, erythropoietin, luteinizing hormone, chorionic 
gonadotropin, hypothalamic releasing factors, antidiuretic hormones, 
thyroid stimulating hormone, relaxin, interferon, thrombopoietin 
(TPO) and prolactin. 

Especially contemplated according to the present invention is 
initially to use input DNA sequences being either wild-type, variant 
or modified DNA sequences, such as a DNA sequences coding for wild- 
type, variant or modified enzymes, respectively, in particular 
enzymes exhibiting lipolytic activity. 
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In an embodiment of the invention the lipolytic activity is a lipase 
activity derived from the filamentous fungi of the Humicola sp., in 
particular Humicola lanuginosa, especially Humicola lanuginosa . 

In a specific embodiment of the invention the initially used input 
DNA fragment to be shuffled with a homologous polypeptide is the 
wild-type DNA sequence encoding the Humicola lanuginosa lipase 
derived from Humicola lanuginosa DSM 4109 described in EP 305 216 
(Novo Nordisk A/S) . 

Also specifically encompassed by the scope of the invention is input 
DNA sequences selected from the group of vectors (a) to (f) and/or 
DNA fragments (g) to (aa) coding for Humicola lanuginosa lipase 
variants from the list below in the Material and Method section. 

Throughout the present application the name Humicola lanuginosa has 
been used to identify one preferred parent enzyme, i.e. the one 
mentioned immediately above. However, in recent years H. lanuginosa 
has also been termed Thermomyces lanuglnosus (a species introduced 
the first time by Tsiklinsky in 1989) since the fungus show 
morphological and physiological similarity to Thermomyces 
lanuglnosus. Accordingly, it will be understood that whenever 
reference is made to H. lanuginosa this term could be replaced by 
r^ermojnyces lanuglnosus. The DNA encoding part of the 18S ribosomal 
gene from Thermomyces lanuglnosus (or H. lanuginosa) have been 
sequenced. The resulting 18S sequence was compared to other 18S 
sequences in the GenBank database and a phylogenetic analysis using 
parsimony (PAUP, Version 3.1.1, Smithsonian Institution, 1993) have 
also been made. This clearly assigns Thermomyces lanuglnosus to the 
class of Plectomycetes, probably to the order of Eurotlales. 
According to the Entrez Browser at the NCBI (National Center for 
Biotechnology Information) , this relates Thermomyces lanuglnosus to 
families like Eremascaceae, Monoascaceae, Pseudoeurotlaceae and 
Trlchocomaceae, the latter containing genera like Emerlcella, 
Aspergillus, Penlclllium, Eupenlcllllum, Paecllomyces, Talaromyces, 
Thermoascus and Scleroclelsta. 

Consequently, such genes encoding lipolytic enzymes of filamentous 
fungi of the genera Emerlcella, Aspergillus, Penlcllllum r 



WO 97/07205 



10 



PCT/DK96/00343 



Eupenicllllum, Paecilomyces f Talaromyces, Thermoascus and 
Sclerocleista are also specifically contemplated according to the 
present invention. y ne 

5 lIpo e !vti e c XainPleS ° f relSVant filaro — genes encoding 

lipolytic enzymes include strains of the A* sidia sp . e _ " 

strains listed in WO 96/13578 (from Novo NordisJc A/S ) which are 
ereby incorporated b y reference, ^sidia sp . strains 14 . 

Strains of Shizopus sp., in particular j». niveus and ^ 
also contemplated according to the invention. 

5 strain^T^ "** "** ^ ** f «* * bacteria, such as a 

strain of the Pseudomonas sp. , ±„ particular Ps. fraai Ps 

stutzerz, Ps . cepacia and Ps. fluoresces (WO 89/04361), or Ps' 
Plantar,! or Ps. Radiol! (US 4,950,417, or Ps. alcaldes and Ps' 

(HP 218 ,72, EP 331 376, or WO 94/25578 

thi Pslu2 V3riantS ° f ^ ^ lipolytic enzyme), 

the Pseudomonas sp. variants disclosed in EP 407 225 or a 

TeZTT 33 1±POlytiC en2yme ' SUCh ^ thS (also 

r^T„T liP ° lytiC SnZyme ^-ribed in WO 88/09367 and OS 

5,389 536 or variants thereof as described in OS 5,352,594, or Ps 

Z7 2 Z S Y r ^r 1 ^ " ^ - »• ^—sinensis (to 

96/12012 from Solvay) or a strain of Bacillus sp., eg the B 

eZt 2 : 3 dSSCribed ^ Dart ° iS ^ (1 " 3 > Biochli- et 

Bxophysxca acta 1131, 253-260, or B. stearothermophilus CJP 
64/77449 92) or B , pundlus (WQ 91/16<22) Qr & ^ 

s P -, e.g. S . scabies, or a strain of Chrome-bacterium sp. e.g c 



viscosujn. 



that rT ," ith PSS "'""""-" V- lipases it has been found 

that lipases from the following organic have a high 
homology such „ at least 60 , homolo9y _ at least h ^ 

falllv f""; 1091 " » nd ChUS "« contemplated to belong to the same 
fanaly of Upases: Ps. ATC c218u8, Pseudomonas sp. lipase 
commercially available as Liposas«, Ps. aeruginosa EF2, T 
aeruginosa PAC1R, Ps. aeruginosa PAOl, Ps. aeruginosa TE 3285, Ps.' 
sp. 109, Ps. psendoalcaligenes Ml, Ps. glumae, Ps. cepacia DSM 3959, 
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Ps. cepacia M-12-33, Ps. sp. KWI-56, Ps. putlda IFO 3458, Ps. putida 
IFO 12049 (Gilbert, J., (1993), Pseudomonas lipases: Biochemical 
properties and molecular cloning. Enzyme Microb. Technol., 15, 634- 
645). The species Pseudomonas cepacia has recently been reclassified 
5 as Burkholderla cepacia, but is termed Ps. cepacia in the present 
application. 

Also genes encoding lipolytic enzymes from yeasts are relevant, ans 
include lipolytic genes from Candida sp. , in particular Candida 
10 rugosa, or Geotrlchum sp. , In particular Geotrlchum candldum. 

Specific -examples of microorganisms comprising genes encoding 
lipolytic enzymes used for commercially available products and which 
may serve as donor of genes to be shuffled according to the 
15 invention include Humlcola lanuginosa, used in Lipolase®, Lipolase® 
Ultra, Ps. mendoclna used in Lumafast®, Ps. alcallgenes used in 
Lipomax®, Fusarlum solanl, Bacillus sp. (US 5427936, EP 528828), 
Ps. mendoclna, used in Liposam®. 

20 Also the Pseudomonas sp. lipase gene shown in SEQ ID NO 14 are 
specifically contemplated according to the invention. 

It is to be emphasized that genes encoding lipolytic enzyme to be shuffled 
according to the invention may be any of the above mentioned genes of 

25 lipolytic enzymes and any variant, modif ication, or truncation thereof. . 
Examples of such genes which are specifically contemplated include the 
genes encoding the enzymes described in WO 92/05249, WO 94/01541, WO 
94/14951, WO 94/25577, WO 95/22615 and a protein engineered lipase variants 
as described in EP 407 225; a protein engineered Ps. mendoclna lipase as 

30 described in US 5,352,594; a cutinase variant as described in WO 94/14964; 
a variant of an Aspergillus lipolytic enzyme as described in EP patent 
167,309; and Pseudomonas sp. lipase described in WO 95/06720. 

A request to the DNA sequences, encoding the polypeptide (s) , to be 
35 shuffled, is that they are at least 60%, preferably at least 70%, 
better more than 80%, especially more than 90%, and even better up 
to almost 100% homologous. DNA sequences being less homologous will 
have less inclination to interact and recombine. 



40 



It is also contemplated according to the invention to shuffle parent 
(homologous) wildt type organisms of different genera. 



WO 97/07205 



12 



PCI7DK96/00343 



5 



10 



length of f k ^ ^ * ** P^erably have a 

length of from about 20 bp to 8 kb, preferably about 40 bp 

more preferred about 80 bp to 4 to , especially about 100 £ to 2 * 

to be able to interact optimally with the opened plasm^d ' 

The method of the invention is very efficient- * 

peptide variants in comparison to p "o r art "J?***** P °~ 
transforming linear DNA fragments /se^enceT comprising 

The inventor found that the transformation frequencv nf 
opened piasmid and a DNA fragment were sfgn"^ 
when transforming a piasmid cut at the same site a T one ZT t 
formation frequency of the opened DNA framtient 

15 h igh as f or uncut pIasmid . * fragment were as 

Without being lifted to any theory it is believed that th. 
of the plasmid(s) restrict (s) the ™n <- • ed that the "Pining 

when not interacting wTth at ^ - °' PlaSm±d < S > 

20 with this an Inor^JnL T ^ ^ fra9ment - In 
a-er only one sZ^Tyol " "~- 

As described in Example 1 sns ^ 

oeyuences, DNA sequences encodina variant- = 
modifications ther^f u variants or mutants, or 

uons thereof, such as extended or elonoatPri man 
and may also b P t-h<= elongated DNA sequences, 

y also be the outcome of DNA sequences having been suMert^rt 

any of the tethods described in the prior art sectt^ 

"etLeo 6 ZT°* " inVe " ti0n ^ °" PUt - • 

exchan^d Tht "su 1 Tr S '' 1 haVe * ° f ^""de.e, 

within the Peptide Variant 7" °* " ^ — — — 
polypeptide It^T e k sparine, it with the parent 

ypeptxde. it rs to be understood that also silent stations is 
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contemplated (i.e. nucleotide exchange which does not result in 
changes in the amino acid sequence) . 

However, the method of the present invention will in most cases lead 
5 to the replacement of a considerable number of amino acid and may in 
certain cases even alter the structure of one or more polypeptide 
domains (i.e. a folded unit of polypeptide structure). 

According to the present invention more than two DNA sequences are 
10 shuffled at the same time. Actually any number of different DNA 
fragments and homologous polypeptides comprised in suitable plasmids 
may be shuffles at the same time. This is advantageous as a vast 
number of quite different variants can be made rapidly without an 
abundance of iterative procedures. 

15 

The inventor have tested the nucleotide shuffling method of the 
invention using significantly more than two homologous DNA 
sequences. As described in Example 2 it was surprisingly found that 
the method of the invention advantageously can be used for 
20 recombining more than two DNA sequences. 

One cycle of shuffling according to the method of the invention may 
result in the exchange of from 1 to 1000 nucleotides into the opened 
plasmid DNA sequence encoding the polypeptide in question. The 
25 exchanged nucleotide sequence (s) may be continuous or may be present 
as a number of sub-sequences within the full-length sequence (s) . 

To support the present invention the inventor made a number of 
additional experiments on different aspect on the method of the 

3 0 invention. The experiments are described below and illustrated in 

the Example 3 to 6 below. 

A number of vectors and fragments comprising an inactivated 
synthetic Humlcola lanuginosa lipase genes were constructed by 
35 introducing f rameshif t/stop codon mutations in the lipase gene at 
various positions. These were used for monitoring the in vivo 
recombination of different combinations of opened vector (s) and DNA 
fragments. The number of active lipase colonies were scored as 
described* in Example 3. The number of colonies determines the 

4 0 efficiency of the opened vector (s) and fragment (s) recombination. 
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One frazneshxft station in said Humicola lanuginosa , 
the opened vector and another in the fragment on the opposite side 
of the opening site gave 3 to 32% of active lipase colonies 
depending on the location and combination. It was concluded that 
. the closer that the mutation is at the ends of the vector the 
higher mixing. ne 

One frameshift mutation in the opened vector and two in the 
fragment on each side of the opening site gave 4 to 42% of active 
colonies depending on the location and combination, some of 
active colonies can be considered to be mosaics, not only .elate" 
to the opening site. ea 

Two frameshift mutations in the opened vector on each side of the 
opening site and one in the fragment gave 0.5 to 3.1% of active" 
colonies depending on the location and combination. Most of these 
active colonies are mosaics of the "parent" DNA. 

Two frameshift mutations in the opened vector ori each side of the 
opening site and a wild type fragment gave 7.7 to 10.7% of activl 
colonies depending on the location. active. 

Zn 3lSO f ° Und am ° Unt ° f VeCt ° rS relative to fragments 

and the size of the fragments are also influencing the result 

llTl^JTl' CereViSlae " d52 ™ tants - the recombination host 
cell showed that the rad52 mutant transformed very well with wild 
type plasmid(s) and expressed the Hu*^ lanu9inosa ^ ^ 
but gave no transf ormants at all with the opened vectors and 
tragments. 

The RAD52 function is required for "classical recombination" (but 
not for unequal sister-strand mitotic recombination, showing that 
the recombination of opened vector and fragment could involve a 
classical recombination mechanism. 

tnrr Cal H reCOmbinati0n 15 reCOmbinati ° n ^chanism involved in 

the recombination between genes located on nonsister chromatids of 

H E 0m a°nd 09 S 0 v:- Ch r OSOineS " ***** * ™< 

RE and Symington LS (1991) "Recombination in Yeast", page 407-522 

« The Molecular and Cellular Biology of the Yeast Saccharic es ' 
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Volume 1 (eds. Broach JR, Pringle JR and Jones EW) , Gold Spring 
Harbor Laboratory Press, New York. 

Multiple partially overlapping fragements 
5 The inventor also tested recombination of multiple partial 
overlapping fragments using the method of the invention. 

The recombination of 2 and 3 partial overlapping fragments into a 
gapped (i.e. that the opening result in cutting out of a little 

10* part of the gene) vector were tested and gave a high recovery of 

recombined Humicola lanuginosa lipase gene. The recovery of active 
lipase gene from different combinations of inactivated Humlcola 
lanuginosa genes was tested for the recombination of 2 partial 
overlapping fragments . The tendency was a higher mixing in the 

15 overlapping region between the 2 fragments in the gapped region 
than in the vector and fragment overlap. 

When recombining many fragments from the same region, the multiple 
overlapping fragment technique will increase the mixing by itself, 
20 but it is also important to have a relative high random mixing in. 
overlapping regions in order to mix closely located 
variants/differences . 

An overlap as small as 10 bp between two fragments were found to be 
25 sufficient to obtain a very efficient recombination. Therefore, 

overlapping in the range from 5 to 5000 bp, preferably from 10 bp to 
500 bp, especially 10 bp to 100 bp is suitable according to the 
method of the invention. 

30 According to this embodiment of the present invention 2 or more 
overlapping fragments, preferable 2 to 6 overlapping fragments, 
especially 2 to 4 overlapping fragments may advantageously be used 
as input fragments in a shuffling cycle. 

35 Besides increasing the mixing of genes, this is a very useful 

method for domain shuffling by creating small overlaps between DNA 
fragments from different domains and screen for the best 
combination. 

40 For instance, in the case of three DNA fragments the overlapping 
regions may be as follows: 
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- the first end of the first fragment overlaps the first end of the 
opened plasmid, 

- the first end of the second fragment overlaps the second end of 
the first fragment, and the second end of the second fragment 
overlaps the first end of the third fragment, 

- the first end of the third fragment overlaps (as stated above) the 
second end of the second fragment, and the second end of the third 
fragment overlaps the second end of the opened plasmid. 

It is to be understood that when using two or more DNA fragments as 
starting material it. is preferred to have continues overlaps between 
the ends of the plasmid and the DNA fragments. 

Even though it is preferred to shuffle homologous DNA sequences in 
the form of DNA fragment (s) and opened plasmid <s ) , it is also 
contemplated according to the invention to shuffle two or more 
opened plasmids comprising homologous DNA sequences encoding 
polypeptides. However, in such case it is compulsory to open the 
plasmids at different sites. 

in an further embodiment of the invention two or more opened 
plasmids and one or more homologous DNA fragments are used as the 
starting material to be shuffled. The ratio between -the opened 
plasmas) and homologous DNA fragment(s) preferably lie in the 
range from 20:1 to 1:50, preferable from 2:1 to 1:10 (mol vector:mol 
fragments) with the specific concentrations being from 1 pM to 10 M 
of the DNA. 

The opened plasmids may advantagously be gapped in such a way that 
the overlap between the fragments is deleted in the vector in order 
to select for the recombination) . 
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Preparing the DNA fragment 

The DNA fragment to be shuffled with the homologous polypeptide 
comprised in an opened plasmid may be prepared by any suitable 
method. For instance, the DNA fragment may be prepared by PCR 
5 amplification (polymerase chain reaction) , as described above, of a 
plasmid or vector comprising the gene of the polypeptide, using 
specific primers, for instance as described in US 4,683,202 or Saiki 
et al., (1988), Science 239, 487 - 491. The DNA fragment may also be 
cut out from a vector or plasmid comprising the desired DNA sequence 
TO by digestion with restriction enzymes, followed by isolation using 
e. g. electrophoresis . 

The DNA fragment encoding the homologous polypeptide in question may 
alternatively be prepared synthetically by established standard 

15 methods, e.g. the phosphoamidite method described by Beaucage and 
Caruthers, (1981), Tetrahedron Letters 22, 1859 - 1869, or the 
method described by Matthes et al., (1984), EMBO Journal 3, 801 - 
805. According to the phosphoamidite method, oligonucleotides are 
synthesized, e.g. in an automatic DNA synthesizer, purified, 

20 annealed, ligated and cloned in suitable vectors. 

Furthermore, the DNA fragment may be of mixed synthetic and genomic, 
mixed synthetic and cDNA or mixed genomic and cDNA origin prepared 
by ligating fragments of synthetic, genomic or cDNA origin (as 
25 appropriate) , the fragments corresponding to various parts of the 
entire DNA sequence, in accordance with standard techniques. 

The plasmid 

The plasmid comprising the DNA sequence encoding the polypeptide in 
30 question may be prepared by ligating said DNA sequence into a 
suitable vector or plasmid, or by any other suitable method. 

Said vector may be any vector which may conveniently be subjected to 
recombinant DNA procedures. The choice of vector will often depend 
35 on the recombination host cell into which it is to be introduced. 

Thus, the vector may be an autonomously replicating vector, i.e. a 
vector which exists as an extrachromosomal entity, the replication 
of which is independent of chromosomal replication, e.g. a plasmid. 
4 0 Alternatively, the vector may be one which, when introduced into the 
recombination host cell, is integrated into the host cell genome and 
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i^:;::r ogether with the -** it has been 

To facilitate the screening process it is preferred that the vector 
5 is an expression vector in which the DNA seauenro _f. 

polypeptide in ^ ion is operably ^ ^Zl^LT 
required for transcription of the DNA. i„ general rhT ^ 
vector is derived from a plasmid, a cosmid or a bacter' T " 
may contain elements of any or all of these. "° °= 

The term, "operably linked" indicates that the s efflnP7 ,, e 
so that they faction in concert for their iJZT "* 
transcription initiates in a propter and ^^^T' ^ 
sequence coding for the polypeptide in question " ^ ™* 

The promoter may be any DNA sequence which «h™ 

«- ho,: ceil :?:ltl ztt 1 ™ 1 

from ,=, es encoding proteins, such as enzymes "tier T , " 
heterologous to the host cell. homologous or 

Examples of suitable promoters for use In veast h n ., • 
promoters from yeast glycolytic genes ( Hitz^ T^ a^ T 

PP Gen 1, 419 - 434, or elcohol dehydrogenese genes (Young et 

n^leel/rr eST plenum ? *~ Chemical" 

ax, eas.), Plenum Press, New York iqqo\ m 
(US 4 SQQ *or*, 1982), or the TPI1 

ZVJLlll ™ 2 ~ ic ,Russe11 et al - <i983 »- — ^. « - 

=e^ 1 are°VrT b t le Pr0 ""° te " *" *" «~ *»■»• »ost 

useful promoters"" tnos 'deri^z ^ STT"- "™ °* ^ 
taka am „i= . rrom the 9ene encoding A. oryzae 

trat Teltra'se ;°° UCOr • *• L- 

a amylase, A. njger acid stable a-amvlase j •_• 

— 1 glucoamylese (glufc, , ai2TO ^ Uplse T " 
e-eline protease, oryzae triose phosphe e UolVrase o77 

Pr™^"*' ~ - ^-amylase^ld-glu, 
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The DNA sequence encoding polypeptide in question invention may 
also, if necessary, be operably connected to a suitable terminator, 
such as the human growth hormone terminator (Palmiter et al., pp. 
cit. ) or (for fungal hosts) the TPI1 (Alber and Kawasaki, op. cit.) 
or ADH3 (McKnight et al., op. cit. ) terminators. The vector may 
further comprise elements such as polyadenylation signals (e.g. from 
SV40 or the adenovirus 5 Elb region) , transcriptional enhancer 
sequences (e.g. the SV40 enhancer) and translational enhancer 
sequences (e.g. the ones encoding adenovirus VA RNAs) . 

The vector may further comprise a DNA sequence enabling the vector 
to replicate in the recombination host cell in question. 
When the host cell is a yeast cell, suitable sequences enabling the 
vector to replicate are the yeast plasmid 2m replication genes REP 
1-3 and origin of replication. 

The plasmid pYl can be used for production of useful proteins and 
peptides, using filamentous fungi, such as Aspergillus sp., and 
yeasts as recombinant host cells ( JP06245777-A) . 

The vector may also comprise a selectable marker, e.g. a gene the 
product of which complements a defect in the recombination host 
cell, such as the gene coding for dihydrof olate reductase (DHFR) or 
the Schizosaccharomyces pombe TPI gene (described by P.R. Russell, 
(1985), Gene 40, 125-130). 

Another example of such suitable selective markers are the ura3 and 
leu2 genes which complements the corresponding defect genes of e.g. 
the yeast strain Saccharomyces cerevisiae YNG318. 

The vector may also comprise a selectable marker which confers 
resistance to a drug, e.g. ampicillin, kanamycin, tetracyclin, 
chloramphenicol, neomycin, hygromycin or methotrexate. For fi- 
lamentous fungi, selectable markers include amdS , pyrG , argB , niaD , 
sC, trpC , pyr4 , and DHFR . 

To direct the polypeptide in question into the secretory pathway of 
the recombination host cell, a secretory signal sequence (also known 
as a leader sequence, prepro sequence or pre sequence) may be 
provided in the recombinant vector. The secretory signal sequence is 
joined to the DNA sequence encoding the lipolytic enzyme in the 
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15 



correct reading frame. Secretory signal sequences are commonly 
positioned 5- to the DMA sequence encoding the polypeptide x ne 
secretory signal sequence may be the signal normally associated wit h 

5 s^etd~n. ln qU6Sti0n ° r ^ " ^ 3 — ™* ~ 

The signal peptide may be naturally occurring signal peptide, or a 
functional part thereof, or it may be a synthetic peptide For 
secretion from yeast cells, suitable signal peptides have been found 
ZJL V " Slgnal PePtidS US 4 ' 870 '°08), the signal 

iTsiT « m ° USe SaliV9ry 311171336 (Cf ' °- "^-buchle et II., 
(1981), Nature 289, 643-646), a modified carboxypeptidase signal 
peptxde ( C f. L.A. Vails et al., (1987), Cell 48, 887-897) the 

T*T<T ginosa lipase signal peptid - the — ~ ; ig 

peptide (of. WO 87/02670), or the yeast aspartic proteasTl (YAP3 
signal peptide (cf. M. Egel-Mitani et al., (1 990>, Ye ast 6, 127- 



10 



20 



25 



30 



35 



40 



For efficient secretion in yeast, a sequence encoding a leader 
peptide may also be inserted downstream of- the signal sequence and 
upstream of the DNA sequence encoding the polypeptide in question 
The function of the leader peptide 



* s to allow the expressed 
P=lyp eptld . t„ b e directs fro, the endoplasmic reticuW " to 

tlll\*r r T S ^ £UrCher " " """^ £ - 

the eel llrr TT eXP ° rt ""° «* P-lH-Pti-. across 

the cell wall or at least through the cellular membrane into the 
periplastic space of the yeast cell, . The leader peptide „ay be the 

A „V, le * der U " ° f " hlCh is "escribed in e.g. us 

4 546.082. EP 16 201, E P 12 3 294, EP 123 544 and EP 163 52 9 " 

wh"riV v t el1 " the ieadet peptiae may be • 5ynthe " c iMd " 

leader T„ "* " " 0t f — d *» -*»»• »»«h«le 

89/02Lr d o\H y ' inS " nM - be »n.tru«.d as described in WO 

89/02463 or WO 92/11378. 

del" 5 * ^ filament ° US fUngi ' the si * nal Peptide may conveniently be 
derived from a gene encoding an Aspergillus- sp . „. Qr 

glucoamylase, a gene encoding a Rnizomucor miehei lipase or 
protease, a Humicola lanuginosa lipase. The signal peptide is 
preferably derived from a gene encoding A. or y2ae TAKA amylase, A 
mg-er neutral a-amylase, A. niger acid-stable amylase, or A. niger 
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glucoamylase . 

The recombination host cell 

The recombination host cell, into which the mixture of plas- 
5 mid/ fragment DNA sequences are to be introduced, may be any 
eukaryotic cell, including fungal cells and plant cells, capable of 
recombining the homologous DNA sequences in question. 

According to prior art prokaryotic microorganisms, such as bacteria 
10 including Bacillus and E. coli; eukaryotic organisms, such as 
filamentous fungi, including Aspergillus and yeasts such as 
Saccharomyces cerevisiae; and tissue culture cells from avian or 
mammalian origins have been suggested for in vivo recombination. All 
of said organisms can be used as recombination host cell, but in 
15 general prokaryotic cells are not sufficiently effective (i.e. does 
not result in a sufficient number of variants) to be suitable for 
recombination methods for industrial use. 

Consequently, preferred recombination host cells according to the 
20 present invention are fungal cells, such as yeast cells or filament- 
ous fungi . 

Examples of suitable yeast cells include cells of Saccharomyces sp., 
in particular strains of Saccharomyces cerevisiae or Saccharomyces 

25 kluyveri or Schizosaccharomyces sp., Methods for transforming yeast 
cells with heterologous DNA and producing heterologous polypeptides 
therefrom are described, e.g. in US 4,599,311, US 4,931,373, US 
4,870,008, 5,037,743, and US 4,845,075, all of which are hereby 
incorporated, by reference. Transformed cells may be selected by, 

30 e.g., a phenotype determined by a selectable marker, commonly drug 
resistance or the ability to grow in the absence of a particular 
nutrient, e.g. leucine. A preferred vector for use in yeast is the 
POT1 vector disclosed in US 4,931,373. The DNA sequence encoding the 
polypeptide may be preceded by a signal sequence and optionally a 

35 leader sequence, e.g. as described above. Further examples of 
suitable yeast cells . are strains of Kluyveromyces , such as K. 
lactis, Hansenula, e.g. H. polymorpha, or Pichia, e.g. P. pastoris 
(cf. Gleeson et al.,(1986), J. Gen. Microbiol. 132, 3459-3465; US 
4,882,279) . 

40 
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Examples of other fungal cells are cells n * 

A * mius sp ., Neurospora sp t : ^ — fungi/ e ., 

particular strains of A. ory.se, A. nddul J 5 Qr T ~ Ch ° de *™ *P» - 

ASPergillus sp . for the expression jizz ns t?? Th : use of 

e-9. f EP 272 277, E P 230 023. The transfol . d «-rxbed in, 

*ay, for instance, be carried out as IZttZ ™ °' * ° W ° r - 
(1989), cen. 7« described by Malardier et al.. 



In a preferred embodiment of the invention i-h- 

cell is a cell xnventaon the recombination host 

ceii of the genus Saccharomyces in ~ • 

cerevisiae. yces ' ln Particular s. 



METHODS AND MATERIALS 



DNA sequence: 

Nicola lanuginosa OSM 4109 der ived lipase encoding DNA seguence. 

Humlcola lanuginosa lipase variants: 
Variants used for preparing vectors to h 

Example 2: ^ 9 vectors to be op en ed with Mt -„t <, 

(a) E56R,D57L,I90F,D96L,E99K 

(b) E56R,D57L,V60M,D62N,S83T,D96P,D102E 

(c) D57G,N94K, D96L,L97M 

(d) E87K,G91A,D96R,I100V,E129K,K237M,I252L P256T GP^a to*. 



DNA frac 



Variants used for preparinc 
amplific ation in Example 2: 

(g) S83T,N94K, D96N 

(h) E87K, D96V 

(i) N94K, D96A 

(j) E87K,G91A,D96A 
(k) D167G,E210V 
(1) S83T,G91A,Q249R 
(m) E87K,G91A 

(n) S83T, E87K, G91A, N94K, D96N, D111N . 
(o) N73D,B87K,G91A,N94I,D96G. 

(P) L67P, I76V, S83T, E87N, I90N, G91A, D96A, K98R. 



oents 



standard ppr 
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(q) S83T,E87K,G91A,N92H,N94K, D96M 

(s) S85P,E87K,G91A, D96L,L97V. 

(t) E87K / I90N,G91A,N94S,"D96N, HOOT. 

(u) I34V, S54P, F80L, S85T, D96G, R108W, G109V, D111G, S116P, L124S, 
5 V132M, V140Q, V141A, F142S, H145R / N162T / I166V, F181P, F183S, 

R205G, A243T,D254G, F262L. 
(v) E56R,D57L,I90F, D96L,E99K 
(x) E56R, D57L, V60M, D62N, S83T, D96P, D102E 
(y) D57G,N94K,D96L,L97M 
10. (z) E87K, G91A, D96R, I100V, E129K, K237M, I252L, P256T, G263A, L264Q 
(aa) E56R, D57G, S58F, D62C, T64R, E87G, G91A, F95L, D96P, K98I 

Strains : 

Expression system host: 
15 Saccharomyces cerevisiae YNG318: MATa Dpep4[cir + ] ura3-52, Ieu2-D2, 
his 4-539 

Saccharomyces cerevislae Rad52: Strain M1533 » MATa rad52 ura3 f 
obtained from Torsten Nilsson Tillgren, Institute of Genetics, 
University of Copenhagen. 

20 

Plasmids : 

pJS026 (see figure 3) 
pJS037 (see figure 4) 
pYES 2.0 (Invitrogen) 

25 

Transformation selective marker 

ura3 

leu2 

30 Media 

SOura~: 90 ml 10 x Basal salt, 22.5 ml 20% casamino acids, 9 ml 1% 
tryptophan, H 2 0 ad 806 ml, autoclaved, 3.6 ml 5% threonine and 90 ml 
20% glucose or 20% galactose added. 

LB-medium: 10 g Bacto-tryptone, 5 g Bacto yeast extract, 10 g NaCl 
35 in 1 litre water. 

Brilliant Green (BG) (Merck, art. No. 1.01310) 

BG-reagent: 4 mg/ml Brilliant Green (BG) dissolved in water 

Substrate 1 : 

10 ml olive oil (Sigma CAT NO. 0-1500) 
4 0 20 ml 2% polyvinyl alcohol (PVA) 
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The Substrate is homogenised for 15-20 minutes. 
Methods : 



Construction of yeast expression v^rhnr 

The expression plasmids p JS 026 and P JS037, are derived from pYES 
2.0. The inducible GALl-promoter of pYES 2.0 was replaced lT th the 

z: ::::r ly expressed tpi (triose ph ° spha - ^: rase -Z 

from Sacctiaromyces cerevisiae (Albert and Karwasaki, (1982) j 2T 
Appl Genet., l, 419-434) and the ura3 promoter has been deleted A 
restriction map of p JS 026 and pJ S037 is shown in fi gure3 and f 
4, respectively. "gure 3 and figure 



Preparation of the w-i I d-type DNA fi-a™^ 
A lipase wild-type DNA fragment can be prepared either by PC R 
P^S026 1 T 1 ° n , <reSUlting ^ l0W ' mediUm ° r high —genesis,, of the 
r^T^^^ « f — ~ - -sting 2l 



Fermentation of Hunlco!* lan , m1nosa liu ^ ±n 

10 ml of SC-ura" medium is inoculated with a s ■ 

and grown at 30°C for 2 days The 10 ml H „ , CereViS " e COlon y 

ml sc-„r a - ^- 13 USed for inoculating 300 

ml SC ura niedxum which is grown at 30°C for 3 days The- 300 ll 7 

used for inoculation 5 1 of the following G -substrate: 

400 g Amicase 

6-7 g yeast extract (Difco) 

12,5 g L-Leucin (Fluka) 

6-7 g (NH 4 ) 2 S0 4 

10 g MgS<V7H 2 0 

17 g K 2 S0 4 

10 ml Trace compounds 

5 ml Vitamin solution 

6.7 ml H3PO4 

25 ml 20% Pluronic (antifoam) 
In a t otal volume of 5000 ml: 

The yeast cells are fermented for 5 days at 30 °C th~, 

PH : 5 -° " kept b * Edition of a 10* NH, solution. Agitation is 300 
rpm for the first 22 hours followed o y ,00 rp„ £or the rest of the 
fermentation. Ait is given with 11 air/l/„i„ for the first 22 „our, 
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followed by 1.5 1 air/l/min for the rest of the f ermentation. 



Trace compounds : 
6.8 g ZnCl 2 

54.0 g FeCl 2 '6H 2 0 

19.1 g MnCl 2 '4H 2 0 
2.2 g CuSCV5H 2 0 
2.58 g CoCl 2 
0.62 g H3BO3 

0.024 g (NH 4 )6Mo 7 0 2 4*4 H 2 0 
0.2 g KI 

100 ml HC1 (concentrated) 
In a total volume of 1 1. 



Vitamin solution: 

250 mg Biotin 

3 g Thiamin 

10 g D-Calciumpanthetonat 

100 g Myo-Inositol 

50 g Cholinchlorid 

1 . 6 g Pyridoxin 

1.2 g Niacinamid 

0.4 g Folicacid 

0.4 g Riboflavin 

In a total volume of 11. 



Transformation of yeast 

Saccharomyces cerevisiae is transformed by standard methods (cf. 
Sambrooks et al., (1989), Molecular Cloning: A Laboratory Manual, 
2nd Ed., Cold Spring Harbor) 

Determination of yeast transformation frequency 

The transformation frequency is determined by cultivating the 
transf ormants on SC-ura"plates for 3 days and counting the number of 
colonies appearing. The number of transf ormants per mg opened 
plasmid is the transformation frequency. 



Screening for positive variants with improved wash performance 

The following filter assay can be used for screening positive 

variants with improved wash performance. 
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Low calcium filter assay 

IL7TL SC Ura " - rePliCa PlatSS (USefUl f ° r Selecti ^ -rains 
carrying the expression vector) with a first protein binding filter 

(Nylon membrane) and a second low protein binding filter (Cellulose 
acetate) on the top. ose 

2) Spread yea st cells containing a parent lipase gene or a mutated 
lipase gene on the double filter and incubate for 2 or 3 days at 

fLt K er eP t COl ° nieS ° n filtSr ^ te «-'«rring the top- 

filter to a new plate. H 

4) Remove the protein binding filter to an empty petri dish 

5) Pour an agarose solution comprising an olive oil emulsion (2% 
PVA:olive oxl-3:l>. Brilliant green (indicator, 0. 004%) , 100 mM tris 
buffer P H9 and EGTA (final concentration 5m*) on the bottom filter 
so as to identify colonies expressing lipase activity in the form of 
blue-green spots. 

6) Identify colonies found in step 5) having a reduced dependency 
for calcium as compared to the parent lipase. Pendency 

DNA sequence was performed by using applied Biosystems ABI DNA 
sequence model 373A according to the protocol in the ABI Dye 
Terminator Cycle Sequencing kit. 

Assessing the effi ^cy of recombination 

vectoTaT /' COl ° nieS determ±neS the ^iancy of the opened 
llTZl T fra ^ ent -combination. The percentage of colonies with 
active lipase activity gives an estimate of the mixing of the 
one 7 and H lnaCtiVe ^ eneS " theoretically it can be calculated for 

IZJZTT the closer to 50% the better if 

likelihood of wild type and frameshift, 25% for 2 frameshifts and 
12.5% for 3 frameshifts. 

Frameshift mutation 

The frameshift mutation were created either by filling in a 
restriction site (in case of 5' overhang) or deleting the "sticky 
Ztl ^ CaSS ,° £ 3 ' ° VSrhang) by T4 DNA Po^erase with or without 

(d ;° x h yn ; cl ; otides - ^ — t s «* dTTP , dCTP and 

" on P T fill±ng ^ ° f rSStriCtion si tes (referred to as 

4)' on ,\ ^ dSl6ting StiCky endS to as 

on F ^ure 7) are well known in the art. 
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Method for assessing colonies with lipase activity 

The number of colonies and positives (i.e. with lipase activity) are 
calculated as the average of 3 plates. 

The cultivation condition and screening condition used is the 
following: 

1) Provide SC Ura-plates with a protein binding filter (Nylon 
filter) onto the plate. 

2) Spread yeast cells containing a parent lipase gene or a mutated 
lipase gene on the filter and incubate for 3 or 4 days at 30 °C. 

3) Remove the protein binding filter with the colonies to a petri 
dish containing: An agarose solution comprising an olive oil 
emulsion (2% PVA: Olive oil=2:l), Brilliant green (indicator , 0 . 004% ) , 
100 mM tris buffer pH 9. 

5) Identify colonies expressing lipase activity in the form of blue- 
green spots . 

EXAMPLES 

Example 1 

Testing in vivo recombination of two homologous genes 

The Saccharomyces cerevisiae expression plasmid pJS02 6 was 

constructed as described above in the "Material and Methods"- 

section. 

A synthetic Humicola lanuginosa lipase gene (in pJS037) containing 
12 additional restriction sites (see figure 4) was cut with Nrul, 
PstI, and Nrul and PstI, respectively, to open the gene 
approximately in the middle of the DNA sequence encoding the lipase. 

The opened plasmid (pJS037) was transformed into Saccharomyces 
cerevisiae YNG318 together with an about 0.9 kb wild-type Humicola 
lanuginosa lipase DNA fragment (see figure 1) prepared from pJS026 
by PCR amplification. 

Further, the opened plasmid was also transformed into the yeast 
recombination host cell alone (i.e. without the 0.9 kb synthetic 
lipase DNft fragment) . 

The transformed yeast cells were grown as described • in the "Ma- 
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terials and Method-section above, and the transformation frequency 
was determined as described above. 

It was found that the transformation frequency of the opened plasmid 
alone was very low (10 transf ormants per mg opened plasmid,, in 

r^rrLtf the r nsformation frequency ° f said ^^lui 

00,000 transformants per mg opened plasmid). 

=o«L 1 Za d/ f a9 " ent ampli£ " d " sul "^ i" 20 tr» sfotMts 

containing fragments covering the lipase gene region of the 
recombined plasmid/f ragments . The recombination mixture of the 20 

stZ f T n \ S analyZed W rSStriCti - digestion using 

standard methods. The result is displayed in Table 1. 



15 Table 1 



20 



5 



PCR SphI Hindu I PstI 

fragment 



Nrul 



PI 
P2 
P3 
P4 
P5 
P6 

Nl 
N2 
N3 
N4 
N5 
N6 

P/Nl 
P/N2 
P/N3 
P/N4 
P/N5 
P/N6 
P/N7 
P/N8 



wt 
sg 
sg 
nd 
wt 
sg 



wt 



wt 



wt 
sg 
sg 
wt 



sg 
sg 
sg 
sg 
sg 
sg 
nd 



sg 



wt 
sg 
sg 
sg 
wt 
sg 



wt 



wt 



wt 
sg 
sg 
wt 



sg 
sg 
sg 
sg 
sg 
sg 
wt 



sg 



wt 



sg 
sg 
sg 
nd 



sg 



wt 



wt 
wt 



sg 
sg 
wt 



sg 
sg 
sg 
sg 
sg 
sg 
wt 



sg 



(not tested) 

BstXI Nhl BstEII Kpnl Xhol 



wt 



wt 



wt 



wt 



wt 



wt 



sg 
wt 



nd 
nd 



sg 
wt 



wt 



wt 



wt 



sg 

wt 
wt 



sg 

sg 
wt 



sg 



wt 



wt 



wt 



wt 



wt 



wt 



wt 



wt 



wt 



wt 



sg 



wt 



sg 



wt 



wt 
sg 



wt 



sg 
wt 
sg 
sg 
wt 
wt 



sg 
nd 
sg 
sg 
nd 
nd 



wt 



wt 



sg 
sg 
sg 
sg 
sg 
wt 
wt 



P: plasmid opened with PstI 
N: Plasmid opened with NRuI 
P/N: plasmid opened with PstI 
a 75 bp fragment) 
wt: wild-type gene restriction 
sg: synthetic gene restriction 
nd: not determined 



wt 



wt 



sg 
nd 



sg 



wt 



wt 



wt 
wt 
wt 



sg 



wt 



sg 
sg 
sg 
sg 
sg 
nd 



sg 



wt 
wt 



nd 
nd 



wt 
nd 

wt 



wt 



wt 
wt 



wt 



sg 



wt 



nd 
sg 
nd 
nd 
sg 
wt 
nd 



and NRuI (resulting in the removal of 



enzyme pattern 
enzyme pattern 



As can bee seen from Table 1 10 transformants (equivalent to 50%) 
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contained recombined DNA sequences. 4 of these 10 DNA sequences 
(equivalent to 20%) contained either a region of the wild-type gene 
recombined into the synthetic gene or a region of the synthetic gene 
recombined into the wild-type fragment. 

5 

Example 2 

In vivo recombination of Humlcola lanuginosa lipase variants 
The DNA sequences of 20 variants of the Humlcola lanuginosa lipase 
10. were in vivo recombined in the same mixture. 

Six vectors were prepared from the lipase variants (a) to (f) (see 
the list above) by ligation into the yeast expression vector pJS037. 
All vectors were cut open with Nrul . 

15 

DNA fragment of all 20 homologous DNA sequences (g) to (aa) (see the 
list above) were prepared by PCR amplification using standard 
methods . 

20 The 20 DNA fragments and the 6 opened vectors- were mixed and 
transformed into the yeast Saccharomyces cerevisiae YNG318 by 
standard methods. The recombination host cell was cultivated as 
described above and screened as described above. About 20 trans- 
formants were isolated and tested for improved wash performance 

25 using the filter assay method described in the "Material and 
Methods " -section . 

Two positive transf ormants (named A and B) were identified using the 
filter assay. 

30 

In comparison to the wild-type amino acid sequence the two re- 
combined positive transf ormants had the following mutations. 

A: D57G, N94K, D96L, P256T 

A is a recombination of two variants. 
originates from the vector (d) 

===== originates from the DNA fragment prepared from variant (y) 

B: D57G, G59V, N94K, D96L, L97M, S116P, S170P, N249R 
40 ???? <<<<< ????? ===== 

B is a recombination of vector (c) , DNA fragments (n) and (u) . 
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- originates from the vector (c) 
<<<< originates from the DNA fragment prepared from variant (u) 
originates from the DNA fragment prepared from variant (n) 
???? Amino acid mutation which is not a result of recombination. 

As can be seen the resulting positive variants have been formed by 

"::^; nati ° n tW ° ° r The -ino acid mutations marked 

..... are not a result of in vivo recombination, as non e of the 
shuffled lipase variants (see the list above, comprise any of said 
10 mutatxons. Consequently, these mutations are a result of random 
mutagenesis arisen during preparation of the DNA fragments by 
standard PCR amplification. 



15 Example 3 



20 



25 



Recombination with one frameshift mutantions 

Synthetic Humicola lanuginosa lipase gene (in vector JS037) was 
made inactive at various positions by deleting (positions 184/385) 
or fillag-in (position 290/317/518/746) restriction enzyme sites 
or by site-directed introduction of a stop codon. All inactive 
synthetic lipase genes of 900 bp can be deduced from Figure 7) . 

A number of different 900 bp DNA fragments were made from the above 
vectors using primer 4699 and primer 5164 using standard PGR 
technique. Smaller PGR fragments were made using primer 8487 and 
primer 4548 (260bp) , primer 2843 and primer 4548 (488bp) . 

30 0 5 ml (app. 0.1 mg) of vectors Blue 425, Blue 426, Blue 428 and 

Blue 429, opened with Pst I (i.e. position 385), vectors Blue 424 
and Blue 425 opened with Nrul (i.e. position 464) were together 
with 3 ml (app. 0.5 mg) of fragments 424, 425, 426, 428, 429 in 
varies combination transformed into 100 ml Sacchromyces cerevisiae 
YNG318 competent cells as displayed in Table 1A. 

The number of colonies and positives (i.e. with lipase activity) 
were calculated as the average of 3 plates as described in the 
Material ^nd Methods section. 

40 

The result of the test is shown in Table 1A 
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Table 1A 



vector + Fragment 


Number of 
colonies 


% of colonies with active 
lipase activity 


1. Blue 428 + 429a 


114 


16% 


2. Blue 429 + 42 8 # 


645 


3% 


3. Blue 426 + 425# 


276 


25% 


4. Blue 425 426 


528 


18% 


5. Blue 425 /Nru I 
+ 426 


539 


28% 


6. Blue 425 + 424 . 


139 


7% 


7. Blue 424/NruI + 
425o 


74 


32% 


8. Blue 428 + 425 


81 


12% 


9. Blue 428 + wt 
fragment 


317 


37% 



Pairwise recombinations of one frameshift mutation on the vector 



and another on the fragment on the opposite side of the opening 
5 site, n determined by 9 plates; # determined by 6 plates. 

The first 2 rows of Table 1A displays vectors and fragments with a 
frameshift on each side of the PstI site. The "mirror image" 
experiment in row 2 compared to row 1 gives a reproducible lower 

10 number of active colonies. The same is true for row 3 and 4 even 
though it is not as pronounced. Moving the opening site closer to 
the frameshift in the vector increases the number of actives as 
seen in row 5. This can explain the reason for the difference in 
the "mirror image" experiments. In both cases the higher number of 

15 positives has the opening site closer to the frameshift in the 
vector. 

It can therefore be concluded that the closer the mutation is to 
the end of the vector the higher chance of mixing. This is probably 
20 arising from the well known fact that free DNA ends have a high 

recombinogenic potential. Therefore it is desirable to have as many 
free DNA ends as possible to increase the mixing; of the genes. This 
is for example obtained in the later example with recombination of 
multiple overlapping fragments. 

25 

Row 6 has a rather low number of actives probably due to the 
location of the frameshift on the fragment exactly at the PstI 
opening site of the vector. 
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10 



15 



Row 7 has the frameshift of the vector close to the opening site 
and again it gives a high number of actives. 

Recombination with on e stop codon mutantiona 

in order to test if there are any difference in the recombination 

the foIl Cy St ° P COd ° n mUtati ° nS COn *> ared to frameshift mutations 
the following experiments were made.. 

Ill S T deSCribed ab ° Ve °' 5 ml <-PP- 0-1 -g, vectors Blue 

624, Blue 625 and Blue 626 (see Table IB) opened with P StI 

comprising stop codons at specified positions (positions 184, 317 

and 746, respectively, (perpared by site-directed mutagenesis) were 

together with 3 ml (app. 0.5 mg, of fragments 624, 625 and 26 

transformed into 100 ml SaccHromyces cerevisiae YNG318 competent 

cells in varios combination as displayed in Table IB. 



25 



30 



Vector + 
Fragment 


Number of 
colonies 


% of colonies with lipase 
activity 


I. Blue 626 + 
624 


ND 


40% 


2. Blue 624 + 
626 


ND 


12% 


3. Blue 625 + 
624 


ND 


75% 


4. Blue 624 + 
625 

Pairwise recombin 


ND 

ations of nnp ct-on 


10% 



20 site. ND 



not determined but a Mgh'SSbS. ^ ° Pening 

Row l and 2 (in Table IB, have the mutations located at the same 
Place as row 1 and 2 in Table 1A. As can be seen the number of 
colonies with lipase activity is clearly higher for the stop codon 
mutations compared to the frameshift mutations, but the same 
relative difference between the "mirror image" experiments. 

This might indicate that the stop codon mutations, which is closer 
to the application" of the method, gives a better mixing than 
frameshift mutations. Row 3 and 4 confirms that the closer the 
mutation is to the end of the vector the higher chance of mixing. 

Recombination with one or two frameshift ^ ation An , h 
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and one or two frameshift mutations in the fragment 

Using the same approach as described above the influence of one or 
two frameshift mutations in the vector and one or two frameshift 
mutations in the fragment were tested using vectors Blue 425, 426 
and 428 (one mutation) and vectors Blue 442, Blue 443 (two 
mutations) and fragments 442 and 443 (two frameshift mutations) 
and fragments 424, 425, 426, 427, 428 (one mutation) and wild-type 
(no mutation) 

The vectors Blue 4 42 and 4 43 are double frameshift mutations: Blue 
442 = 428+429 and blue 443 = 427+429 (see Figure 7) . 

Recombination was performed by transforming 0.5 ml vector (app. 0.1 
mg) opened with PstI and 3 ml PCR-fragment (app. 0.5 mg) into 100 
ml Sacchromyces cerevisiae YNG318 competent cells. 

The result of the test is shown in Table 2A and Table 2B 



Table 2A 



Vector + 
Fragment 


Number of 
colonies 


% of colonies with active 
Lipolase 


1. Blue 425 + 
442 


142 


; 15% 


2. Blue 425 + 
443 


144 


14% 


3. Blue 42 6 + 
442 


42 


42% 


4. Blue 426 + 
443# 


77 


20% 


5. Blue 428 + 
443 


115 


3.8% 



One frameshift mutation on the vector and two on the fragment on 
each side of the opening site. # determined by 6 plates. 



Table 2B 



Vector + Fragment 


Number of 
colonies 


% of colonies with active 
Lipolase 


Blue 442 + 424 


137 


0.5% 


Blue 442 + 426 


118 


1.1% 


Blue 442 + 427# 


125 


1.3% 


Blue 443 + 425 


540 


2.5% : 
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Blue 443 + wt 
fragment 

Table 2A shows a rather high number of colonies with lipase 
actrvrty even with a total of 3 frameshifts (b ut only III 
frameshrft on the veotor) except for the last row where L 
frameshift on the veotor is looated far from the 

: has fewer actives than lane 3 probably d J T ~ f^ft' 

the frameshrft on the fragment making the active genes mosaics the, 
are not related to the opening site ,see figure J, . l a lZt 2 TT 
very low number o, actives are observed whan there are 2 
frameshrfts located on the vector. Most of these active colonies 
are mosaics of th<* "naronf ™ T7 , <~uxoru.es 
related to t l • waning that the mixing is not 

related to the opening site (see figure 2B) . 

Recombination with two diffe rent vectors or^raggents 

The result of recombination with two different vectors or 
fragnments the test is shown in Table 3 



Table 3 



25 




1 oTtaot^ " COl ° nieS " e "°" C ™ «P.«i-« in row 

table 3 as expected. The fragment added i„ the ro „ has 
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two frameshifts each corresponding to the frameshift on each 
vector. Via a tripartite recombination 4.2% actives are created. 
With two fragments with each one frameshifts and a vector with the 
same two frameshifts very few actives are found. 



Recombination with vectors opened at different sites 

Opening the vector in one side instead of approximately in the 
middle still gives good recombination as shown in Table 4 . Two 
vectors opened at different sites can also recombine to some extent 
(compare with the vector controls in table 13) . 



Table 4 



Vector + Fragment 


Number of 


% of colonies with active 




colonies 


Lipolase 


Blue 428/xho + 429 


160 


11% " 


Blue 428/xho+Blue 


35 


6.3% 


429/pst# 







Opening of the vector in one side instead of in the middle. # 
determined by 6 plates. 



Recombination at different concentrations of vector and fragment 

The relative concentration of vector to fragment do influence the 
percentage of positive colonies as can be seen in Table 5. 



Table 5 



Vector + Fragment 


Number of 
colonies 


% of colonies with lipase 
activity 


0.5|il Blue 42 6 + 
3|il 442 


42 


42% 


1.5^1 Blue 426 + 
3^1 442 


21 


51% 


l.Sjxl Blue 42 6 + 
9^1 442 


34 


26% 


1.5^1 Blue 426 + 
3^1 427 


230 


2.8% 


1^1 Blue 442 + ljil 
425 


224 


1.16% 


Ijxl Blue 442 + 2^1 


429 


0.9% : 
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Blue 442 + 4^1 
425 


434 


1.6% 


1^1 Blue 442 + 8ul 
425 


481 


1.6% 


Ijil Blue 442 + 16^1 
425 


497 


2.0% 


^A-^iuy uiie concentration 


of the vectoi 


* or 


fragment. ' 



Recombination with f ragments of different si*» 

The size of the fragment also influences the recombination result 
as seen in Table 6. 



10 





Table 6 


Vector + Fragment 


Number of 
colonies 


% of colonies with active 
Lipolase 


Blue 424 + 425 
(260bp) 


73 


34% 


Blue 424 + 425 
(489bp) 


130 


45% 


Blue 424 + 424 
(480bp) 


133 


0.3% 


Blue 424 + 428 
(480bp) 


130 


36% 


j Blue 428 + 425 
(480bp) 


150 


28% _ 


Blue 425 + 424 
(480bp) 


69 


0% 


Blue 425 + 428 
(480bp) 


63 


55% 


— uux^uon Wltn smailer fragm^tsthan 900 b P : 1 



15 



Recombination wi th unopened vectors 

Transformation with unopened vectors shows a very low degree of 
recombination (Table 7) . 



Table 7 



Plasmid 


Number of 
colonies 


% of colonies with active 
Lipolase 


Blue 428 + Blue 
429 


887 


0.3% 
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Blue 426 + Blue 


697 


0.7% 


425 







Recombination of unopened plasmids. 



Example 4 

5 

Test of S. cerevislae mutants altered in recombination 

Using the same approach as described in Example 3 recombination of 
opened and unopened vectors and fragments were tested using a 
10 Saccharomyces cerevisiae rad52 mutant as the recombination host 
cell. The result is displayed in Table 8. 



Table 8 



Vector + 
Fragment 


Number of 
colonies 


% of colonies with active 
Lipolase 


Blue 428 + 429 


0 


0 


Blue 442 + 427 


0 


0 


Blue 424 + 425 


0 


0 


Blue 426 + 443 


0 


0 


Plasmid pJSO 
37 


544 


100% 



Recombination result in rad52 mutant. 



15 The result with rad52 showed that recombination was completely, 
abolished. The RAD52 function is required for classical 
recombination (but not for unequal sister-strand mitotic 
recombination) showing that the recombination of opened vector and 
fragment could involve a classical recombination mechanism. 

20 

Example 6 

^ Recombination of multiple partial overlapping fragments 

In order to increase the mixing of the mutations by the 
recombination method of the invention, recombination of two 
fragments and one gapped vector were attempted. 

30 Table 15 



Vector + Fragment 


Number of 


% of colonies with lipase 




colonies 


" activity 


1. pJS037/HindIII-XhoI 


> 2000 


100% 


+ PCR319+PCR327 






2. pJS037/HindIII-XhoI 


« 2000 


w 0.2% 


+ PCR321+PCR331 







WO 97/07205 



38 



PCT/DK96/00343 



3. pJS037/HindII3>XhoI 
+PCR319+PCR331 




pJS037/HindIH-xhoI 
+ PCR321-fPCR386 

6. Blue 428/HindIII- 
Xhol + PCR321+PCR331 



9. Blue 428/HindIII- 
xho1 + PCR3274-PC R385 

10. Blue 429/HindIII- 
Xhol + PCR319+PCR386 



11. Blue 429/HindIIl- 
Xhol + PCR321+P CR3 8 6 

Blue 44 2/HindIIl- 
Xhol + PCR3 1 9+PCR32 7 



400 



0.2% 




* 400 



13. Blue 



/Hindi 
Xhol + 



II- 



14. Blue 429/HindIH- 
Xhol + 

15. Blue 442/HindIli- 
Xhol + 



* 350 



1500 



* 10% 



* 15% 



* 15% 



10% 




17. Blue 428/HindIIl- 
Xhol + PC R321 

Kecoabination resu lt of two traom^nt- ^ *rJ 

5 rows are controls. fragments « ia a gapped vector. The last 



As can be seen in Table 15, the recovery of th- u - , 

lipase gene i, ver y efficient. The laT 5 LTJ^'^ T"" 

that the opened vector alone or with onlv IZ\ * 

the whole gap (see flour. „ figment not covering 

gap (see fig ure 3) glves OQly vety [m colQnies 

colonies" ^ We fra9TO »" 10.. of active 

9» ent PCR331 fragment has the frameshift located at the 
-«I "t. which, in this ruination, is not covered £ a wild 
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type fragment (see figure 3) and therefore gives, about 0% of active 
lipase. The same is the case for row 3 and 6. 

In the row 4, fragment PCR386 containing a frameshift at the SphI 
5 site which is overlapped by wild type sequences in the gapped 
vector.. The. .frameshift was recombined into less than 10% of the 
genes which is lower than the result for one fragment recombination 
in the last row of Table 1A above. 

10 In row 5 a father high mixing is observed between the 2 fragments 

each containing a frameshift and the wild type gapped vector giving 
25% active and 75% inactive lipase colonies. This is probably due 
to that the fragment PCR321 has the frameshift in the overlap 
between the 2 fragments and in the gapped region of the vector. If 

15 fragment PCR386 contributes to 10% inactives like in row 4, 
fragment PCR321 gives the remaining 65% inactives - therefore 
PCR386 gives 35% wt in the overlap. 

Row 7 is the "mirror image" of row 4 with the frameshift at the 
20 SphI site on the vector (see Figure 7) and 2 wild type fragments . 
giving an integration of the wild type fragment into more than 90% 
of the vectors . 

Row 8 shows like in row 5 that the frameshift of PCR321 in the 
25 overlap and gap region gives a very high number of inactive. 

In row 9, fragment PCR385 with a frameshift in the vector overlap, 
causes a very high number of inactives. 

30 Row 10 gives a rather high number of inactives compared to row 7 
and 4. It is not increased in row 11. 

Row 12 shows that two frameshifts on the vector gives a lower 
number of actives compared to one in row 7. 

35 

The recombination of 3 partial overlapping fragments into a gapped 
vector is also very efficient as seen in Table 16. The last row 
with the vector alone gives very few colonies. As can be seen in 
figure 4 all fragments used are wt. In the first row in table 16, 
4 0 there are rather long overlaps between the vector and fragments, 

but in the middle row the overlap between PCR353 and 355 is only 10 
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bp long and it is still very efficiently recombined. This 
surprising result may be utilized for very easy domain shuffling of 
even distantly related genes. For example can 3 different domains 

a " 2 ?r snt r nes be made as pcr f ™- 

10 to 20 bp overlap by primer design and recombined together and 
subsequently screened for the best combination (1000 possible 
combinations) . 



10 



Vector + Fragment 



Table 16 

Number of 
colonies 



% of colonies with active 
Lipolase 




15 



20 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: 

(A) NAME: Novo Nordisk A/S 

(B) STREET: Novo Alle 

(C) CITY: Bagsvaerd 
10 (E) COUNTRY: Denmark 

(F) POSTAL CODE (ZIP) : DK-2880 

(G) TELEPHONE: +45 4444 8888 

(H) TELEFAX: +45 4449 3256 

(ii) TITLE OF INVENTION: Method for preparing polypeptide variants 
15 (iii) NUMBER OF. SEQUENCES : 15 

(iv) (iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

2 0 (D) SOFTWARE: Patentln Release #1.0, Version #1.30B (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer 2843" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



35 



ACAAACATTA CGTGCACGGG 20 
(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
4 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) .DESCRIPTION: /desc « "Primer 4 699" 
4 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



50 



CGGTACCCGG GGATCCAC 18 
(2) INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
55 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Primer 5164" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

60 AATTACATCA TGCGGCCC 18 
(2) INFORMATION FOR SEQ ID NO: 4: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer 8487" 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CATTTGCTCC GGCTGCAGGG A 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 base oairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Primer 4548" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GTGTTCCGCC GGTCTGTACG GTCAGGAATT CTGCAAAAGC 
CCTGTTTCCG ACTCGGGGGG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base oairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Primer 5576" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GGTCTGTACG GTCAGGAATT C 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) MOLECULE TYPE: other nucleic acid 

/ <~L DESCRIPTI0N: /desc = "Primer 5578" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGTTTCGGGT GACGGGGAC 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

t ASCRIPTION: /desc - "Primer 1596" 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GGAGCAAATG TCATTTAT 



(2) INFORMATION FOR SEQ ID NO: 9: 
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10 



15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) - STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Primer 454 5" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GCATTGGCAA CTGTTGCCGG AGCAGACCTG CGTGGAAATG 

GG TAT GAT AT CGACGTGTTT TCAT 64 
(2) INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Vector pJS02 6" 

25 (vi) ORIGINAL SOURCE: 

(B) STRAIN: Humicola lanuginosa 

(ix) FEATURE: 

(A) NAME /KEY: CDS 
30 (B) LOCATION: 1. .876 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

35 ATG AGG AGC TCC CTT GTG CTG TTC TTT GTC TCT GCG TGG ACG GCC TTG 
Met Arg Ser Ser Leu Val Leu Phe Phe Val Ser Ala Trp Thr Ala Leu 
15 10 15 



48 



GCC AGT CCT ATT CGT CGA GAG GTC TCG CAG GAT CTG TTT AAC CAG TTC 96 
4 0 Ala Ser Pro lie Arg Arg Glu Val Ser Gin Asp Leu Phe Asn Gin Phe 
20 25 30 

AAT CTC TTT GCA CAG TAT TCT GCA GCC GCA TAC TGC GGA AAA AAC AAT 14 4 

Asn Leu Phe Ala Gin Tyr Ser Ala Ala Ala Tyr Cys Gly Lys Asn Asn 
45 35 40 45 

GAT GCC CCA GCT GGT ACA AAC ATT ACG TGC ACG GGA AAT GCC TGC CCC 192 

Asp Ala Pro Ala Gly Thr Asn lie Thr Cys Thr Gly Asn Ala Cys Pro 

50 55 60 

50 

GAG GTA GAG AAG GCG GAT GCA ACG TTT CTC TAC TCG TTT GAA GAC TCT 240 

Glu Val Glu Lys Ala Asp Ala Thr Phe Leu Tyr Ser Phe Glu Asp Ser 

65 70 75 80 

55 GGA GTG GGC GAT GTC ACC GGC TTC CTT GCT CTC GAC AAC ACG AAC AAA 288 
Gly Val Gly Asp Val Thr Gly Phe Leu Ala Leu Asp Asn Thr Asn Lys 
85 90 95 

TTG ATC GTC CTC TCT TTC CGT GGC TCT CGT TCC ATA GAG AAC TGG ATC 336 
60 Leu lie Val Leu Ser Phe Arg Gly Ser Arg Ser lie Glu Asn Trp lie 
100 105 110 

GGG AAT CTT AAC TTC GAC TTG AAA GAA ATA AAT GAC ATT TGC TCC GGC 384 
Gly Asn Leu Asn Phe Asp Leu Lys Glu lie Asn Asp lie Cys Ser Gly 
65 115 120 125 
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25 



45 



55 S| IS S» 2S IS IX £ E 12 S J| E - s « 



215 " 220 

vl? pS« b GA ? TC CCG CCG CGC GAA TTC GGT TAC AGC CAT TCT Arr rrn 
225 9 Pr ° IS Arg Glu Phe G1 * *Y* Ser Sis £er fer °S 

235 240 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS * 
c n <A) LENGTH: 292 amino acids 

3U (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(if) MOLECULE TYPE: protein 
55 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11; 



60 



65 



Met Arg Ser Ser Leu Val Leu Phe Phe Val Ser Ala Trp Thr Ala Leu 

3 10 15 

Ala Ser Pro lie Arg Arg Glu Val Ser Gin Asp Leu Phe As. Gin Phe 

" 30 
Asn Leu Phe Ala Gin Tyr Ser Ala Ala Ala Tyr Cys Gly Lys Asn Asn 

40 45 

Asp Ala Pro Ala Gly Thr Asn He Thr Cys Thr Gly Asn Ala Cys Pro 

^ 3 fin 



432 



ACG P A AGG Cft G AAG GTG GAG GAT GCT GTG AGG GAG CAT CCr r»r 

Thr Leu Arg Gin Lys Val Glu Asp Ala Val Arg ctu Ms Pro Sp Tyr 48 ° 

155 160 

io 25 82 S S S IS SS S£ 25 IS IS S S S Jg S ■» 

170 175 

l5 s is a stj es sj ^ is % s s s s s »« 

185 190 

£ IS S SS *S SS IS ffi J£S s s s su s s s «« 

20 ^ 00 205 

SS f£ S IS IS S S £ 2S S S SS S !2 S5 S •» 



720 



30 as SS S S SJ IS If? SS S ?S S !5 S 25 J* S» - 

250 255 

35 SS 5! K S IX gp S IS S S IS IS SS S£ SI 816 

265 270 

£S S S 2? S S S SS IS S IS S K IS 8 " 

40 280 285 

ACA TGT CTT TAG 

Thr Cys Leu * .876 
290 
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Glu Val Glu Lys Ala Asp Ala Thr Phe Leu Tyr Ser Phe Glu Asp Ser 
65 70 75 80 

Gly Val Gly Asp Val Thr Gly Phe Leu Ala Leu Asp Asn Thr Asn Lys 
5 85 90 95 

Leu lie Val Leu Ser Phe Arg Gly Ser Arg Ser lie Glu Asn Trp lie 
100 105 110 

10 Gly Asn Leu Asn Phe Asp Leu Lys Glu lie Asn Asp lie Cvs Ser Gly 
115 120 125 



15 



30 



45 



60 



65 



Cys Arg Gly His Asp Gly Phe Thr Ser Ser Trp Arg Ser Val Ala Asp 
130 135 140 

Thr Leu Arg Gin Lys Val Glu Asp Ala Val Arg Glu His Pro Asp Tyr 
145 150 155 160 



Arg Val Val Phe Thr Gly His Ser Leu Gly Gly Ala Leu Ala Thr Val 
20 165 170 175 

Ala Gly Ala Asd Leu Arg Gly Asn Gly Tyr Asp lie Asp Val Phe Ser 
180 185 190 

25 Tyr Gly Ala Pro Arg Val Gly Asn Arg Ala Phe Ala Glu Phe Leu Thr 
195 200 205 



Val Gin Thr Gly Gly Thr Leu Tyr Arg lie Thr His Thr Asn Asp lie 
210 215 220 

Val Pro Arg Leu Pro Pro Arg Glu Phe Gly Tyr Ser His Ser Ser Pro 

225 230 235 240 



Glu Tyr Trp lie Lys Ser Gly Thr Leu Val Pro Val Thr Arg Asn Asp 

35 245 250 255 

lie Val Lys lie Glu Gly lie Asp Ala Thr Gly Gly Asn Asn Gin Pro 

260 " 265 270 

4 0 Asn lie Pro Asp lie Pro Ala His Leu Trp Tyr Phe Gly Leu lie Gly 

275 280 285 



Thr Cys Leu * 
290 



(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 
50 (A) LENGTH: 876 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: circular 

55 (ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Vector pJS037' 



(vi) ORIGINAL SOURCE: 

(B) STRAIN: Humicola lanuginosa 

<ix) FEATURE: 

(A) NAME /KEY : CDS 
.(B) LOCATION : 1 . .876 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
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96 



144 



192 



240 



336 



ATG AGG AGC TCC CTT GTG CTG TTC TTT GTC TCT GCG TGG ACG GCC TTG 
Met Arg Ser Ser Leu Val Leu Phe Phe Val Ser Ala Trp Thr Ala Leu 
1 5 10 15 

GCC AGT CCT ATA CGT AGA GAG GTC TCG CAG GAT CTG TTT AAC CAG TTC 
Ala Ser Pro lie Arg Arg Glu Val Ser Gin Asp Leu Phe Asn Gin Phe 
20 25 30 

AAT CTC TTT GCA CAG TAT TCA GCT GCC GCA TAC TGC GGA AAA AAC AAT 
Asn Leu Phe Ala Gin Tyr Ser Ala Ala Ala Tyr Cys Gly Lys Asn Asn 
35 40 45 

GAT GCC CCA GCA GGT ACA AAC ATT ACG TGC ACG GGA AAT GCA TGC CCC 
Asp Ala Pro Ala Gly Thr Asn He Thr Cys Thr Gly Asn Ala Cys Pro 
50 55 60 

GAG GTA GAG AAG GCG GAT GCA ACG TTT CTC TAC TCG TTT GAA GAC TCT 
Glu Val Glu Lys Ala Asp Ala Thr Phe Leu Tyr Ser Phe Glu Asp Ser 
65 70 75 80 

GGA GTG GGC GAT GTC ACC GGC TTC CTT GCT CTC GAC AAC ACG AAC AAG 288 
Gly Val Gly Asp Val Thr Gly Phe Leu Ala Leu Asp Asn Thr Asn Lys 
85 90 95 y 

CTT ATC GTC CTC TCT TTC CGT GGC TCA AGA TCT ATA GAG AAC TGG ATC 
Leu lie Val Leu Ser Phe Arg Gly Ser Arg Ser He Glu Asn Tro He 
100 105 no 

GGG AAT CTT AAC TTC GAC TTG AAA GAA ATA AAT GAC ATT TGC TCC GGC 38 4 

Gly Asn Leu Asn Phe Asp Leu Lys Glu He Asn Asp He Cvs Ser Glv 
H5 120 125 

TGC AGG GGA CAT GAC GGC TTC ACT TCG TCC TGG AGG TCT GTA' GCC GAT j-jo 
Cys Arg Gly His Asp Gly Phe Thr Ser Ser Trp Arg Ser Val Ala Asp 
130 135 140 

ACG TTA AGG CAG AAG GTG GAG GAT GCT GTT CGC GAG CAT CCC GAC TAT 4 80 

Thr Leu Arg Gin Lys Val Glu Asp Ala Val Arg Glu His Pro Asp Tyr 
145 150 155 xlo 

CGC GTG GTG TTT ACC GGC CAT AGC CTT GGT GGT GCG CTA GCA ACT GTT 528 
Arg Val Val Phe Thr Gly His Ser Leu Gly Gly Ala Leu Ala Thr Val 
165 170 175 

GCC GGA GCA GAC CTG CGT GGA AAT GGG TAT GAT ATC GAC GTG TTT TCA 
Ala Gly Ala Asp Leu Arg Gly Asn Gly Tyr Asp He Asp Val Phe Ser 
180 185 190 

TAT GGC GCC CCC CGA GTC GGT AAC CGT GCT TTT GCA GAA TTC CTG ACC 
Tyr Gly Ala Pro Arg Val Gly Asn Arg Ala Phe Ala Glu Phe ilu Thr 
195 200 205 

vll Gin Th^ r?° Sf* ? T ° 2*° ° GC A ? T ACC CAC ACC GAT A *T *72 

Val Gin Thr Gly Gly Thr Leu Tyr Arg He Thr His Thr Asn Asp He 

210 215 220 

GTC CCT AGA CTC CCG CCT CGA GAA TTC GGT TAC AGC CAT TCT AGC CCA 7?n 
Val Pro Arg Leu Pro Pro Arg Glu Phe Gly Tyr Ser His Ser Ser 
225 2 30 235 240 

GAG TAC TGG ATC AAA TCT GGA ACA CTA GTC CCC GTC ACC CGA AAC GAT 
Glu Tyr Trp He Lys Ser Gly Thr Leu Val Pro Val Thr Arg Asn Asp 
245 250 255 

ATC GTG AAG ATA GAA GGC ATC GAT GCC ACC GGC GGC AAT AAC CAG CCT 
He Val Lys He Glu Gly He Asp Ala Thr Gly Gly Asn Asn Gin Pro 
2 60 265 270 



576 



768 



816 
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20 



35 



50 



65 



47 



AAC ATT CCG GAT ATC CCT GCG CAC CTA TGG TAC TTC GGG TTA ATT GGG 8 64 

Asn lie Pro Asp lie Pro Ala His Leu Trp Tyr Phe Gly Leu lie Gly 
275 280 285 



ACA TGT CTT TAG 
Thr Cys Leu * 
290 



(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 292 amino acids 
15 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Arg Ser Ser Leu Val Leu Phe Phe Val Ser Ala Trp Thr Ala Leu 
1 5 10 15 



Ala Ser Pro lie Arg Arg Glu Val Ser Gin Asp Leu Phe Asn Gin Phe 

25 20 25 30 

Asn Leu Phe Ala Gin Tyr Ser Ala Ala Ala Tyr Cys Gly Lys Asn Asn 

35 40 45 

30 Asp Ala Pro Ala Gly Thr Asn lie Thr Cys Thr Gly Asn Ala Cys Pro 

50 55 60 



Glu Val Glu Lys Ala Asp Ala Thr Phe Leu Tyr Ser Phe Glu' Asp Ser 
65 70 75 80 

Gly Val Gly Asp Val Thr Gly Phe Leu Ala Leu Asd Asn Thr Asn Lys 
85 90 95 



Leu lie Val Leu Ser Phe Arg Gly Ser Arg Ser lie Glu Asn Trp lie 
40 100 " 105 110 

Gly Asn Leu Asn Phe Asp Leu Lys Glu lie Asn Asp lie Cys Ser Gly 
115 120 125 

4 5 Cys Arg Gly His Asp Gly Phe Thr Ser Ser Trp Arg Ser Val Ala Asp 
130 135 140 



Thr Leu Arg Gin Lys Val Glu Asp Ala Val Arg Glu His Pro Asp Tyr 
145 150 155 160 

Arg Val Val Phe Thr Gly His Ser Leu Gly Gly Ala Leu Ala Thr Val 
165 170 175 



Ala Gly Ala Asp Leu Arg Gly Asn Gly Tyr Asp lie Asp Val Phe Ser 
55 180 185 190 

Tyr Gly Ala Pro Arg Val Gly Asn Arg Ala Phe Ala Glu Phe Leu Thr 

195 200 205 

60 Val Gin Thr Gly Gly Thr Leu Tyr Arg lie Thr His Thr Asn Asp lie 
210 215 220 



Val Pro Arg Leu Pro Pro Arg Glu Phe Gly Tyr Ser His Ser Ser Pro 

225 230 235 240 

Glu Tyr Trp lie Lys Ser Gly Thr Leu Val Pro Val Thr Arg Asn Asp 

245 250 255 



876 
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260 ASP 265 210 ^ ?r ° 




Thr Cys Leu * 
290 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(3) STRAIN: Pseudomonas sp. 

(ix) FEATURE: 

(A) NAME/ KEY: mat peptide 

(B) LOCATION: 1. .8 64 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
(3) LOCATION : 1 . .864 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

llf r?5 q CC l° G ^° TAC ACC ^ G ACC CAG TAC CCG ATC GTC CTG ACC 
Phe Gly Ser Ser Asn Tyr Thr Lys Thr Gin Tyr Pro He Val Leu Thr 

5 10 15 

u A ° ^ TG CTC GGT TTC GAC AGC CTG CTT GGA GTC GAC TAC TGG TAC 

His Gly Met Leu Gly Phe Asp Ser Leu Leu Gly Val Asp 7yr Trp £yr 



30 



r?S tT T n CC o CA GCC CTG CGT m GAC GGC GCC ACC GTC TAC GTC ACC 
Gly He Pro Ser Ala Leu Arg Lys Asp Gly Ala Thr Val Tyr Val Thr 
J 3 40 45 

vi? c GC ^ G CTC GAC ACC TCC GAA GCC CGA GGT GAG CAA CTG CTG 
Glu Val Ser Gin Leu Asp Thr Ser Glu Ala Arg Gly Glu Gin Leu LeS 
* u 55 60 

SE? fr T ? ^ G G ^ A ? C GTG GCC ATC AGC GGC ^ G G CC AAG GTC AAC 

Thr Gin Val Glu Glu lie Val Ala He Ser Gly Lys Pro Lys Val Asn 

70 75 80 

CTG TTC GGC CAC AGC CAT GGC GGG CCT ACC ATC CGC TAC GTT GCC GCC 
Leu Phe Gly His Ser His Gly Gly Pro Thr lie Arg Tyr Val Ala a2 
85 90 95 

GTG CGC CCG GAT CTG GTC GCC TCG GTC ACC AGC ATT GGC GCG CCG CAC 
Val Arg Pro Asp Leu Val Ala Ser Val Thr Ser He Gly Ala iro 2J 

AAG GGT TCG GCC ACC GCC GAC TTC ATC CGC CAG GTG CCG GAA GGA TCG 
Lys Gly Ser Ala Thr Ala Asp Phe He Arg Gin Val Pro Glu Sy sir 

120 i25 



48 



96 



144 



192 



240 



288 



336 



384 
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GCC AGC GAA GCG ATT CTG GCC GGG ATC GTC AAT GGT CTG GGT GCG CTG 432 
Ala Ser Glu Ala lie Leu Ala Gly lie Val Asn Gly Leu Gly Ala Leu 
130 135 140 

5 ATC AAC TTC CTT TCC GGC AGC AGT TCG GAC ACC CCA CAG AAC TCG CTG 4 80 

lie Asn Phe Leu Ser Gly Ser Ser Ser Asp Thr Pro Gin Asn Ser Leu 
145 150 155 160 

GGC ACG CTG GAG TCA CTG AAC TCC GAA GGC GCC GCA CGG TTT AAC GCC 528 
10 Gly Thr Leu Glu Ser Leu Asn Ser Glu Gly Ala Ala Arg Phe Asn Ala 

165 170 " 175 

CGC TTC CCC CAG GGG GTA CCA ACC AGC GCC TGC GGC GAG GGC GAT TAC 576 
Arg Phe Pro Gin Gly Val Pro Thr Ser Ala Cys Gly Glu Gly Asp Tyr 
15 180 185 " 190 



20 



40 



50 



65 



GTG GTC AAT GGC GTG CGC TAT TAC TCC TGG AGG GGC ACC AGC CCG CTG 624 
Val Val Asn Gly Val Arg Tyr Tyr Ser Trp Arg Gly Thr Ser Pro Leu 
195 200 205 

ACC AAC GTA CTC GAC CCC TCC GAC CTG CTG CTC GGC GCC ACC TCC CTG 672 
Thr Asn Val Leu Asp Pro Ser Asp Leu Leu Leu Gly Ala Thr Ser Leu 
210 215 220 



25 ACC TTC GGT TTC GAG GCC AAC GAT GGT CTG GTC GGA CGC TGC AGC TCC 720 
Thr Phe Gly Phe Glu Ala Asn Asp Gly Leu Val Gly Arg Cys Ser Ser 
225 230 235 240 

CGG CTG GGT ATG GTG ATC CGC GAC AAC TAC CGG ATG AAC CAC CTG GAC 7 68 

3 0 Arg Leu Gly Met Val lie Arg Asp Asn Tyr Arg Met Asn His Leu Asp 

245 250 255 

GAG GTG AAC CAG ACC TTC GGG CTG ACC AGC ATC TTC GAG ACC AGC CCG 816 
Glu Val Asn Gin Thr Phe Gly Leu Thr Ser lie Phe Glu Thr Ser Pro 
35 260 265 270 

GTA TCG GTC TAT CGC CAG CAA GCC AAT CGC CTG AAG AAC GCC GGG CTC 8 64 

Val Ser Val Tyr Arg Gin Gin Ala Asn Arg Leu Lys Asn Ala Gly Leu 
275 280 285 



(2) INFORMATION FOR SEQ ID NO: 15: 



(i) SEQUENCE CHARACTERISTICS: 
4 5 (A) LENGTH: 288 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



Phe Gly Ser Ser Asn Tyr Thr Lys Thr Gin Tyr Pro lie Val Leu Thr 
1 5 10 15 



His Gly Met Leu Gly Phe Asp Ser Leu Leu Gly Val Asp Tyr Trp Tyr 
55 20 25 30 

Gly lie Pro Ser Ala Leu Arg Lys Asp Gly Ala Thr Val Tyr Val Thr 
35 40 45 

60 Glu Val Ser Gin Leu Asp Thr Ser Glu Ala Arg Gly Glu Gin Leu Leu 
50 55 60 



Thr Gin Val Glu Glu lie Val Ala He Ser Gly Lys Pro Lys Val Asn 

65 70 75 80 

Leu Phe Gly His Ser His Gly Gly Pro Thr He Arg Tyr Val Ala Ala 

85 90 95 
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Val Arg Pro 

Lys Gly Ser 
115 

Ala Ser Glu 
130 

lie Asn Phe 
145 

Gly Thr Leu 
Arg Phe Pro 



Asp Leu 
100 

Ala Thr 
Ala He 
Leu Ser 



Glu Ser 
165 

Gin Gly 
180 



Val Val Asn 
195 

Thr Asn Val 
210 

Thr Phe Gly 
225 

Arg Leu Gly 
Glu Val Asn 



Gly Val 
Leu Asp 
Phe Glu 



Val Ala 

Ala Asp 

Leu Ala 
135 

Gly Ser 
150 

Leu Asn 
Val Pro 
Arg Tyr 



Ser Val 
105 

Phe He 
120 

Gly He 
Ser Ser 
Ser Glu 



Thr Ser 
Arg Gin 
Val Asn 



Asp Thr 
155 

Gly Ala 
170 



He Gly Ala 
110 

Val Pro Glu 
125 

Gly Leu Gly 
140 

Pro Gin Asn 
Ala Arg Phe 



Pro His 
Gly Ser 
Ala Leu 



Ser Leu 
160 

Asn Ala 
175 



Thr Ser 
185 

Tyr Ser 
200 



Pro Ser 
215 

Ala Asn 
230 



Met Val 
245 



Val Ser Val 
275 



lie Arg 
Phe Gly 
Tyr Arg Gin Gin 



Asp Leu 
Asp Gly 
Asp Asn 



Ala Cys 
Trp Arg 
Leu Leu 



Gin Thr 
260 



Leu Thr 
265 

Ala Asn 
280 



Leu Val 
235 

Tyr Arg 
250 

Ser He 
Arg Leu 



Gly Glu Gly 
190 

Gly Thr Ser 
205 

Gly Ala Thr 
220 

Gly Arg Cys 
Met Asn His 



Asp Tyr 
Pro Leu 
Ser Leu 



Ser Ser 
240 

Leu Asp 
255 



Phe Glu Thr 
270' 

Lys Asn Ala 
285 



Ser Pro 
Gly Leu 
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PATENT CLAIMS 

1. A method for preparing polypeptide variants by shuffling 
different nucleotide sequences of homologous DNA sequences by in 
5 vivo recombination comprising the steps of 

a) forming at least one circular plasmid comprising a DNA sequence 
encoding a polypeptide, 

b) opening said circular plasmid (s) within the DNA sequence (s) 
10 encoding the polypeptide (s) , 

c) preparing at least one DNA fragment comprising a DNA sequence 
homologous to at least a part of the polypeptide coding region on at 
least one of the circular plasmid(s), d) introducing at least one 
of said opened plasmid (s), together with at least one of said 

15 homologous DNA fragment (s) covering full-length DNA sequences 
encoding said polypeptide (s) or parts thereof, into a recombination 
host cell, 

e) cultivating said recombination host cell, and 

f) screening for positive polypeptide variants. 

20 

2. The method according to claim 1, wherein more than one cycle of 
step a) to f) are performed. 

3. The method according to claims 1 and 2, wherein two or more 
25 opened plasmids are shuffled with one or more homologous DNA 

fragments in the same shuffling cycle. 

4. The method according to any of claims 1 to 3, wherein the opened 
plasmid (s) is (are) gapped. 

30 

5. The method according to any of claims 1 to 4 wherein the ratio 
between the opened plasmid(s) and homologous DNA fragment (s) are in 
the range from 20:1 to 1:50, preferable from 2:1 to 1:10 (mol 
vector:mol fragments) with the specific concentrations being from 1 

35 pM to 10 M of the DNA. 

6. The method according to any claims 1 to 5, wherein 2 or more, 
preferably from 2 to 6, especially 2 to 4 of the DNA fragments have 
partially overlapping regions. 

40 
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7 The method according to claim 6, wherein the overlapping regions 
of the DNA fragments lies in the range from 5 to 5000 bp, preferably 
from 10 bp to 500 bp, especially 10 bp to 100 bp. 

5 8. The method according to any of claims 1 and 8, wherein at least 
one cycle of step a, to f) i s backcrossing with the initially used 
DNA fragments. 

9- The method according to any of claims 1 and 8, wherein the 
10 plasmidCs, is(are) opened in the region around the middle of the DNA 
sequence (s) encoding the polypeptide (s) . 

10. The method according to any of claims 1 to 9, wherein the 
plasmid(s) is(are) opened close to a mutation in the DNA sequence^) 

15 encoding the polypeptide (s) . quence ( s) 

11. The method according to any of claims 1 to 10, wherein the DNA 

sllTZTJ P H r T red " C) ±S(are) Und - -nditions 

suitable for high, medium or low mutagenesis 

20 

12 The method according to any of claims 1 to 11, wherein the 
polypeptides producible from the input DNA sequences are enzymes or 
proteins with biological activity. 

!5 13. The method according to claim 12, wherein the polypeptides are 
enzymes selected from the group including proteases, lipases, 
cutmases, cellulases, amylases, peroxidases, oxidases and phytases. 

0 tl'J hB meth ° d aCC ° rding t0 claim 12 < herein the polypeptides are 
proteins with biological activity selected from the group including 

.f™' somatostatin, somatotropin, thymosin, 

parathyroid hormone, pigmentary hormones, somatomedin, erythro- 
poietin, luteinizing hormone, chorionic gonadotropin, hypothalamic 

112 Ct ° rS ' antidiuretic hormones, thyroid stimulating 

hormone, relaxm, interferon, thrombopoietin (TPO) and prolactin 



5 



15 



The method according to any of claims 1 to 13, wherein at least 
one of the initially used input DNA sequences is a wild-type DNA 

oZicui' T " 3 SSqUenCe C ° ding f ° r wiid ~tyPe enzymes, in 

particular lipases, derived from filamentous fungi, SUC h as Humicola 

in. particular Humicola lanuginosa, especially Humicola 
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lanuginosa DSM 4109. 

16. The method according to claim 15, wherein at least one of the 
input DNA sequences is selected from the group of vectors (a) to (f) 

5 and/or DNA fragments (g) to (aa) coding for .Humlcola lanuginosa 
lipase variants . 

17. The method according to any of claims 1 to 13, wherein at least 
one of the initially used input DNA sequences is a wild-type DNA 

10 sequence, such as a DNA sequence coding for wild-type enzymes, in 
particular lipases, derived from filamentous fungi of the genera 
Absldla, Rhlzopus , Emerlcella , Aspergillus , Penlcllllum, 
Eupenlcllllum, Paecllomyces , Talaromyces, Thermoascus and 
Scleroclelsta . 

15 

18. The method according to any of claims 1 and 13, wherein at least 
one of the initially used input DNA sequences is a wild-type DNA 
sequence, such as a DNA sequence coding for wild- type enzymes, in 
particular lipases, derived from 'bacteria, such as Pseudomonas sp., 

20 in particular Ps . fragl, Ps. stutzerl, Ps . cepacia,. Ps. fluorescens, 
Ps. plantarll, Ps. gladioli, Ps. alcallgenes, Ps. pseudoalcallgenes , 
Ps. mendoclna, Ps . auroglnosa , Ps. glumae, Ps . syrlngae, Ps. 
wlsconslnensls , or a strain of Bacillus sp., in particular B. 
subtllls , B. stearothermophllus or or B. pumilus, or or a strain of 

25 Streptomyces sp., in particular S. scabies, or a strain of 
Chromoba cterlum sp. In particular C. vlscosum. 

19. The method according to any of claims 1 to 13, wherein at least 
one of the initially used input DNA sequences is a variant DNA 

30 sequence, such as a DNA sequence coding for a variant enzyme, in 
particular lipase variants, derived from yeasts, such as Candida 
sp., 'in particular Candida rugosa, or Geotrlchum sp. , in particular 
Geotrlchum candldum. 

35 20. The method according to any of claims 1 to 19, wherein the 
homologous input DNA sequences are at least 60%, preferably at least 
70%, better more than 80%, especially more than 90%, and even up to 
100% homologous. 

4 0 21. The method according to any of claims 1 to 20, wherein the 



WO 97/07205 



54 



PCT/DK96/00343 



recombination host cell is a eukaryotic cell, such as a funeral cell 
or a plant cell 



22. The method according to claim 21, wherein said fungal cell is « 
yeast cell f rom the group of cell of Saccharomyces sp in 
particular strains of Saccharoses cerevlsiae or Szccbaronyces 
kluyveri or Schlzosaccharomyces S p., in particular 

Schlzosaccharomyces pombe, or iU uyveromyces sp., SUC h as K lactis 
or Hansenula sp. , in particular H. polymorpha, or Pichia sp , i„ 
particular P. P astoris f or a filamentous fungi from the group of 
Aspergillus sp., in particular A. niger, A. nidulans or A. oryzae 

°;. W ;7 POra SP " ° r FU5ariUm SP " ±n F . oxysporun, or 

Trxchoderma sp. . 

23. The method according to any of claims 1 to 22, wherein the 
plasmid DNA sequence (s, coding for the polypeptide (s) is (are) 
operably linked to a replication sequence. 

24. The method according to claim 23, wherein the plasmid DNA 
sequence (s, encoding the polypeptide (s) is (are) operably linked to a 
functional promoter sequence. 

25. The method according to claim 24, wherein the plasmid is an 
expression plasmid. 

26 The method according to claim 25, wherein the expression 
plasmid is pJS026 or pJS037. 

Title : Method for preparing polypeptide variants 
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ATGAGGAGCTCCCTTGTGCTGTTCTTTGTCTCTGCGTGGACGGCCTTGGCCAGTCCTATT 
5447 + + + + + + 55Q€ 

1 MRSSLVLFFVSAWTALASPI 



CGTCGAGAGGTCTCGCAGGATCTGTTTAACCAGTTCAATCTCTTTGCACAGTATTCTGCA 
5507 ---+ + + + + + 5566 

21 RREVSQDLFNQFNLFAQYSA 

GCCGCATACTGCGGAAAAAACAATGATGCCCCAGCTGGTACAAACATTACGTGCACGGGA 
5567 + + h + + 5626 

41 AAYCGKNNDAPAGTNITCTG 

AATGCCTGCCCCGAGGTAGAGAAGGCGGATGCAACGTTTCTCTACTCGTTTGAAGACTCT 
5627 + h H + -i j. 5686 

61 N A, CPEVEKADATFLYSFEDS 

GGAGTGGGCGATGTCACCGGCTTCCTTGCTCTCGACAACACGAACAAATTGATCGTCCTC 
5687 — -H *— — — H — — -f — — — — — — j _ ^ c-j a cl 

81 GV"-GDVTGFL.ALDNTNKLIVL 

TCTTTCCGTGGCTCTCGTTCCATAGAGAACTGGATCGGGAATCTTAACTTCGACTTGAAA 
5747 + + + + + + 5Q06 

101 SFRGSRS IENWIGNLNF DLK 



GAAATAAATGACATTTGCTCCGGCTGCAGGGGACATGACGGCTTCACTTCGTCCTGGAGG 
5807 h — — f ——+—————————+———————.——+—— _«._4._..^.___ 

121 EINDICSGCRGHDGFTSS*'W~R 



TCTGTAGCCGATACGTTAAGGCAGAAGGTGGAGGATGCTGTGAGGGAGCATCCCGACTAT 
5867 + + + + + + 5926 

141 SV ADTLRQKVEDAVREHPDY 

CGCGTGGTGTTTACCGGACATAGCTTGGGTGGTGCATTGGCAACTGTTGCCGGAGCAGAC 
5927 — — i — — — -f — 1 k -f- 5986 

161 RVVFTGHSLGGALATVAGAD 



CTGCGTGGAAATGGGTATGATATCGACGTGTTTTCATATGGCGCCCCCCGAGTCGGAAAC 
5987 I 1 h — +— + ^ 6046 

181 lrgngydidvfsygaprv'gn 

AGGGCTTTTGCAGAATTCCTGACCGTACAGACCGGCGGAACACTCTACCGCATTACCCAC 
6047 — : -« f + + + + + 6106 

201 RAFAE FLTVQTGGTLYRI T H 

ACCAATGATATTGTCCCTAGACTCCCGCCGCGCGAATTCGGTTACAGCCATTCTAGCCCA 
6107 n -i k h ^ H 6166 

221 TNDIVPRLPPREFGYSHSSP 

GAGTACTGGATCAAATCTGGAACCCTTGTCCCCGTCACCCGAAACGATATCGTGAAGATA 
6167 + h + + + + 6226 

241 EYWIKSGTLVPVTRNDIVKI 

GAAGGCATCGATGCCACCGGCGGCAATAACCAGCCTAACATTCCGGATATCCCTGCGCAC 
6227 ~ — k — — — — -~— — — — — — . — — ______ __ _ + — — — 62 8 6 

261 EGIDATGGNNQPNIPDI PAH 



CTATGGTACTTCGGGTTAATTGGGACATGTCTTTAG 

6287 + + + 6322 

281 LWYFGLIGTCL* 
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5447 ^^f GAGC ^ C ^ CTTGTGCTGTTCTTTGTCT CTGCGTGGACGGCCTTGGCCAGTCCTATA 

1 smbi s 3 L v L F+ F " v ""s" + A""w"TT~z"T"s + ~r"r 5 - 06 

CGTAGAGAGGTCTCGCAGGATCTGTTTAACCAGTTCAATCTCTTTGCACAGTATTCMCT 
21RR EVSQDLFNQF N L V A~~Q~ _ Y + ~_"a" 
5567 -_- GCA * AC * GCGGAAAA ^ CAATGA ^ 

41 A sjh/ ° ® K"N~N + "D"A~p" + A~G"r~N~"r~r"c + T"G" ^ 
5627 ~- G f^ T ?f f CC GAGGTAGAGAAGGCGGATGCAACGTTTCTCTACTCGTTTGAAGACTCT 
61 NACPEVEKADAT F~~L~~Y~"s~~F~"e + ~D~"s~ 
5687 ff AG ^ G f CGA ^TCACCGGCTTCCTTGCTCTCGACAACACGAAC 

81 G V~ G~TTTTVT"a L~V~N~T~N~~k'TT~V~T 

Bglll 

57 47 ^5IIIff? T f GC I C ^ GMCTATAGAG ^ CTGGATCGGGAATCTT ^ c TTCGACTTGAAA 
101 S "f R G s"~R~S~~r~E~~N~~W~ + r~G~~N~"r"N~~F"~D + T~K~ ^ 
5807 ^^I^^^J^GCTCCGGCTGCAGGGGACATGACGGCTTCACTTCGTCCTGGAGG 
121 E I N D + I C s" G~~C~ R~~G + ~H~"d"~g" + f"~t"~S~"s~~W~~R "f* 
TCTGTAGCCGATACGTTAAGGCAGAAGGTGGAGGATGCTGTTCGCGAGCATCCCGACTAT 

141 SVADTLRQKVED ~A~~V~ R~~E~h""p""d"~Y ^ 

BstXI Nhel 
5927 fff+^f^^ACCGGCCATAGCCTT^ 

161 R V V F V~G~ H~S ™™TXT"XVV~™G~~A~~D "f* 

5987 gg t gg ^tgggtatgatatcgacgtgttttcatatggcgccccccgagtcStam 

181 L + R g" _ N + G~ 7~D ~l ~ D VTTTITVV'r'T'T'n ^ 

6 Q 4 ^ GCTT TTGCAGAATTCCTGACCGTACAGACCGGC K *" 1 

+ --—_-. * - . , 6106 



CGTGCTTTTGCAGAATTCCTGACCGTACAGACCGGCGGTACCCTCTACCGCATTACCCAC 

+ + + + * + 

201RAFAEFLTVQTGGTLYRITH 



6107 A 55^f A ^ A ^J CCCTA ^^ 



Xhol 

U3AATTCGGTT 

+ + 6166 



221 TNDIV PRLPPREFGYS H S ~S~ P 

Spel 

6167 - ™f TGGAACACTAGTCCCCGTCACCCGAAACGATATCGTGAAGATA 
241 E Y W I K _ S " G r L "v~p"VT"R~"N~"D~"r"v + "K"~r ^ 
6227 - A * GCCACCGGC f CCAATAACCAGCCTAACATTCCGGATATCCCTGCGCAC 
261 E G I D A~~T~~G~ G + "N~~N~"Q" + p"~N~r~P _ ~D~"r~P + "A~"H~ 

CTATGGTACTTCGGGTTAATTGGGACATGTCTTTAG 
6287 — — — +— — \ — _____ — — j j 6322 

281 LWYFGL IGTCL* 
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